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1 Abstract 

In this report, we detail the complete and exact rulebook that the Great Firewall of China (GFW) exerts on Wikipedia. 
We call it “rulebook” (instead of the common term “blacklist”) because we not only identify the blacklisted terms, but 
also the exact string matching rules deployed by GFW. An efficient probing methodology makes this possible. 

GFW blocked Wikipedia outright in the early years but gradually loosened the blockage, first by unblocking all non- 
Chinese versions, then by unblocking the Chinese version, except for certain entries deemed harmful by the Chinese 
authority. 

There have been some efforts in understanding the Wikipedia blacklist, for example, at time of writing, the site 
Greatfire.org [ I ] tracks ~ 700 Wikipedia pages and ~ 400 are claimed to be blocked or partially blocked in China. 

Wikipedia contains millions of pages, e.g. more than 700,000 articles for the Chinese version, and more than 
4,240,000 articles for the English version. It seems a daunting and unfeasible task to test these pages exhaustively, 
hence there has been no well known attempt to gather the complete blacklist. 

While a small sample of the blacklist is useful, the complete picture can be much more powerful in revealing the 
underlying works of GFW and its operators. In this study, we devised a methodology which efficiently examines the 
entire Wikipedia corpus, hence exposing to the world the complete GFW rulebook for Wikipedia the first time. In 
total, there are 919 rules (excluding URL terms) which are applicable to Wikipedia, affecting 5336 pages in Chinese 
Wikipedia and 67 English Wikipedia pages. 

The revealed rulebook also demonstrates that the GFW operation is haphazard and ill-maintained. At the same 
time, Chinese censorship bureaucracy intends to be thorough and extensive. 

To be precise, the findings in this report are on two Wikipedia snapshots: 2013-09-08 for the Chinese version and 
2013-09-04 for the English version. 

In Version 2.0, we studied GFW’s filtering rules for HTTP responses extensively and identified a comprehensive 
list (including those affecting Wikipedia and beyond). This list is small (19 items) but they affect many more pages on 
Wikipedia and other websites. 

1.1 Structure of the Report 

• Section 1 : The abstract. 

• Section 2: The background on GFW and keyword-based filtering. 

• Section 3: The methodology (for HTTP requests). 

• Section 4: Four types of GFW rules for Wikipedia. 

• Section 5: GFW filtering rules for HTTP responses and the methodology (new in Version 2.0). 

• Section 6: Wikipedia specifics, e.g. its content structure and certain features that are relevant to blockage. 

• Section 7: Caveats and cautions when interpreting the list. 

• Section 8: The complete GFW rulebook for Wikipedia. 

• Section 9: Conclusion remarks. 

• Appendix A: Diagnosis of Greatfire.org’s Wikipedia list. 

• Appendix B: Observations of self censorship attempts on Chinese Wikipedia. 

* Author contact: Email: summer.agony@gmail.com; Twitter: @SummerAgony. 
tVersion 1.0 was released on October 1, 2013. 
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2 Background 

2.1 Related Work 

There are several attempts in understanding various blacklists employed by China’s censors. 

• Greatfire.org [1] is a website dedicated to track China censorship, including websites (URLs and IPs), Weibo 
search blacklist, blocked search terms on search engines and Wikipedia pages. Their Wikipedia list is very 
helpful to our research, and we now have a full diagnosis of that list (see Appendix A). 

• Citizen Lab operates the site china-chats.net [2], based on the paper by Crandall et al. [3], which reverse- 
engineered the blacklists employed by two popular IM clients in China. 

• China Digital Times [4] operates a crowd-sourcing project to track the search blacklist applied on Weibo.com. 

• ConceptDoppler, a 2007 paper by Crandall, et al [5], covers GFW and its keyword-based filtering in great details. 
Furthermore, it developed a smart methodology based on LSA (latent semantic analysis) to pick potentially 
sensitive terms for probing and identified 122 blacklisted keywords. That list is apparently outdated, but their 
probing methodology remains valid, and is the basis of the methodology we use in this report. 

The author hopes that the information revealed in this report significantly increase the “transparency” of GFW. 

2.2 GFW and Keyword-based Filtering 

The technical details of GFW have been studied fairly thoroughly since more than 10 years ago [6] [7]. In summary, 
GFW uses a combination of IP blocking, port blocking, DNS poisoning, and DPI-based IDS (intrusion detection 
system), etc, to disrupt normal internet traffic. 

Our focus here is the keyword-based filtering, which is been used more and more widely by Chinese censors due 
to its precision, flexibility and scalability. In a nutshell, GFW devices apply pattern matching on internet traffic, when 
it finds a match, it sends out forged TCP RESET packets to the server and the client to interrupt the TCP connection. 
Worse off, the reset stays effective for 90 seconds, leaving the user unable to connect to the server in this duration. 
Note that not many users know about the 90 seconds interval, and they usually give up and “learn” that the website is 
“unstable”. Certain people believe that the site would remain blocked “from several minutes to up to an hour”, but I 
have not seen documented evidences for these claims. 

Actually when GFW blocks a website, keyword-based filtering is almost always used, either on its own or in 
combination with DNS poisoning and/or IP blocking. This is because DNS poisoning can be circumvented by using 
alternative DNS servers or by editing the host file (the actual filename and path depends on operation systems), and 
IP blocking requires “maintenance” work to diligently track IP address changes of these sites. The keyword-based 
approach is a low-maintenance and high-efficiency one. From my experience, GFW uses keyword-based blocking for 
almost all blocked sites, DNS poisoning is used for a subset, and IP blocking is reserved only for those high-impact, 
big-name websites. 


2.3 HTTP Request Scan and HTTP Response Scan 

There are two types of keyword-based filtering. One type is applied to HTTP requests, the other type is applied to 
HTTP responses. It is very important to distinguish between the two, because they are two separated systems. The 
filtering rules are entirely different, as well as the user experience. 

An HTTP request is sent by the user’s browser to web server for content, it is well structured and much smaller than 
HTTP responses both in size and quantity. GFW’s filtering rules for HTTP requests are about 1600 (excluding website 
URLs), and they contain all kinds of stuff, from very general terms to arcane strings. We identified a complete set of 
919 rules (excluding website URLs) that affect Wikipedia. On the contrary, GFW’s filtering rules for HTTP responses 
are much more specific. Our extensive study identified only 19 of them. Even though we can not claim this small list 
is complete, we have high confidence that we have not missed many. This drastic difference is because scanning HTTP 
responses is way more complicated and costly than scanning HTTP requests. 

These two types of filtering render different user experience. If a user visits a page which offends GFW in its HTTP 
request scan, he will almost immediately get a “connection reset” error in the browser. If the page URL passes GFW’s 
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HTTP request scan but the page content contains something offends GFW’s HTTP response scan, he will often see a 
partially loaded page which just hangs there. 

The blocking rates for these two types are very different as well. For HTTP request scan, in my hundreds of 
thousands of tests, GFW reset more than 99% of the offending HTTP requests. For HTTP response scan, the reset rates 
have wide variation among different tested IPs and rules, and they are usually in the range of 60% to 95%. 

GFW’s filtering for HTTP requests is much easier to study and indeed studied much more than that for HTTP 
responses. In particular. Version 1.0 of this report is on GFW’s filtering rules for HTTP requests, not HTTP responses. 
Version 2.0 adds the learnings on GFW’s HTTP response filtering. 

2.4 HTTP Request Scan Details 

We examined how exactly GFW scans HTTP request. For illustration, let’s look at the following HTTP request to the 
page http://en.wikipedia.org/wiki/Xu_Zhiyong as an example: 

GET /wiki/Xu_Zhiyong HTTP/1.l\r\n 
Host: en.wikipedia.org\r\n 
Connection: keep-alive\r\n 

Referer: http://en.wikipedia.org/wiki/Main_Page\r\n\r\n 
User-Agent: [omitted]\r\n 
[other headers omitted] 

From extensive study, we found that GFW pulls out the Host field (en. wikipedia. org in the above exam¬ 
ple) and the GET target (/wiki/Xu_Zhiyong HTTP/1.1 in the above example) and concatenates the two, i.e. 
en . wikipedia .org/wiki/Xu_Zhiyong HTTP/1.1, then GFW applies its rulebook to this string and checks 
if it matches any. GFW does not seem to care about other HTTP request types like POST, HEAD etc, these other 
request types are not for retrieving page content. 

A common (and quite natural) hypothesis is that GFW checks the Host field to see if it is Wikipedia and checks 
the GET target to see if it contains a sensitive term. We tested it extensively and found no indication of this, or any 
other way GFW handles the Host field and the GET target other than as described in the previous paragraph. This 
is understandable from an implementation efficiency perspective. My theory is that GFW has one component to do 
the extraction (i.e. identifying HTTP request, pulling out Host and GET fields and concatenating), and a separate 
component to do the pattern matching. 

2.5 GFW and Wikipedia 

For Wikipedia, GFW blocked the entire site on lune 3, 2004. There were temporary lifts for several times after that. 
After a meeting between Wikipedia founder Jimmy Wales and China authority in September 2008, the site became 
mostly accessible. China authority changed to use a keyword-based approach to block access to certain individual 
pages, which is the subject of this study. 

Wikipedia does offer an HTTPS version which bypasses GFW’s keyword-based filtering, but the HTTPS version 
is usually blocked via a host-port block. The most recent block started on May 31, 2013 and remains on, or partially 
on. 


3 Methodology (HTTP Request Scan) 

This section is about the methodology to study GFW’s filtering rules for HTTP requests. Section 5 covers the method¬ 
ology to study GFW’s HTTP response filtering and the learnings. 

3.1 Wikipedia Dump 

Wikipedia offers content dump, from which we can gather all entries in the current or previous snapshots. In this study, 
we downloaded the dump for Chinese and English on Sep 8, 2013, from http://dumps.wikimedia.org/zhwiki/20130908/ 
and http://dumps.wikimedia.org/enwiki/20130904/ respectively. 
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Field 

Type 

Null 

Key 

Default 

Extra 

page_id 

int(10) unsigned 

NO 

PRI 

NULL 

auto j ncrement 

page_namespace 

int(ll) 

NO 

MUL 

NULL 


page title 

varbinary(255) 

NO 


NULL 


page_restrictions 

tinyblob 

NO 


NULL 


page_counter 

bigint(20) unsigned 

NO 


0 


page_is_redirect 

tinyint(3) unsigned 

NO 

MUL 

0 


page_is_new 

tinyint(3) unsigned 

NO 


0 


page_random 

double unsigned 

NO 

MUL 

NULL 


page touched 

binary(14) 

NO 


& 


pageJatest 

int(10) unsigned 

NO 


NULL 


pageJen 

int(10) unsigned 

NO 

MUL 

NULL 



Table 3.1: Database schema for the page table of Wikipedia 


Information for the Wikipedia entries is offered as a MySQL database dump to be downloaded. We mainly work 
with the page table, whose schema is shown in Table 3.1. 

Each record in the page table corresponds to one Wikipedia page, whose URL can be constructed from page_name 
space and page.title. The following are two examples (more details in Table 6.1). 

• Record [page_id=221324; page_namespace=0; page_title=“Princelings”] corresponds to the URL: http://en. 
wikipedia.org/wiki/Princelings (an article). 

• Record [page_id=2791496; page_namespace=l; page_title=“Princelings”] corresponds to the URL: http://en. 
wikipedia.org/wiki/Talk:Princelings (the Talk page for that article). 

We did not test all entries in this table, since some of these entries are insignificant or not applicable. To be precise, 
for the Chinese version, we tested all entries except for MediaWiki (namespace=8) and MediaWiki talk (namespace=9); 
for the English version, we tested all entries in Article (namespace=0). User (namespace=2), Project (namespace=4), 
File (namespace=6). Template (namespace=10). Help (namespace=12), and Category (namespace=14). In total, we 
examined 3,078,365 entries for the Chinese version and 20,631,910 entries for the English version. The actual number 
of examined Chinese Wikipedia pages is one order of magnitude bigger, because there are 9 variants of the Chinese 
Wikipedia (see Section 6.2). Section 6 contains more details about Wikipedia content. 

3.2 Probing 

Our probing is based on the probing methodology described in the ConceptDoppler paper [5], Basically we first 
establish a valid TCP connection to a destination across GFW, then we send probing TCP packets to observe response. 
For more details, please read that well-written paper which is freely available online. 

Our main improvements are in the following three areas: 

• Firstly, in one probing request, we can put in multiple phrases for testing. As we have described in Section 2.2, 
to test if a Wikipedia URL is on GFW rulebook, we can put the entire URL (or multiple such URLs) in the GET 
target of the HTTP request for testing. This dramatically increases the probing efficiency, since majority of the 
Wikipedia pages do not trigger GFW reset. 

There is a limit on the size of the probing request. When the request is too long, it may get truncated or the 
destination host may return various errors which may “pollute” the signal we are looking to capture. This size 
limit seem to vary among different destination hosts. So it is conservative to not test too many phrases in one 
run, and it is good practice to verify the size is appropriate for a destination host, e.g. examining false positive 
rate and false negative rate by sending known offending requests or known non-offending requests with varying 
size. 

• Secondly, like we have described in Section 2.2, we can perform the test on any web server as long as it’s on the 
other side of GFW. It does not need to be a Wikipedia server. Furthermore, when GFW sees an offending packet, 
it only resets the connection between the client IP and the server IP and blocks it for 90 seconds. Connection 
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between the user IP and other hosts are not affected. So we can run many tests simultaneously. In fact, we ran 
dozens of probing threads concurrently in this study. This is the crucial reason why we can examine millions of 
Wikipedia URLs using just one machine in a short period. 

• Lastly, we built a pipeline to streamline the entire probing process. For example, if one probing request contain¬ 
ing 20 Wikipedia URLs triggers reset, the program would use an efficient search algorithm to find which URL(s) 
is the culprit. Furthermore, given one “offending” URL, we wrote a program to automatically examines what is 
the actual GFW string matching rule that it offends. Section 3.3 is devoted to this aspect. 

With these improvements, our program can examine tens of millions of URLs in a reasonable time frame. 

That said, there are many nitty-gritty issues with GFW and Wikipedia, thus significant babysitting and manual in¬ 
spection is still required. But that is manageable and not more than a few days of work for an individual. More impor¬ 
tantly, these manual inspection and investigation greatly improved the author’s understanding of GFW, and Wikipedia 
as well. 


3.3 Identifying GFW String Matching Rules 

It is well known that identifying regular expressions with no constraints is costly and difficult. In the GFW case, we 
make one assumption which makes it much easier. GFW need to be efficient, so it is reasonable to assume that it does 
not use unnecessarily complex regular expressions. Indeed, hundreds of cases I’ve examined show that GFW only uses 
simple plain string matching like the following: 

• Single: $target_string ~ $term 

• Double: $target_string ~ $term_l && $target_string ~ $termJ2 

• Triple: $target_string ~ $term_l && $target_string ~ $term_2 && $target_string ~ $term_3 

• or more 

In all the cases I’ve examined, GFW never uses regex features like metacharacter, character classes, or boolean 
“or” (boolean “or” can be viewed as equivalent to multiple rules). This is not unexpected, because GFW devices need 
to scan the huge internet traffic between China and the world in real time, for this purpose, plain string matching 
combined with boolean “and” is the most cost-effective approach. 

However, GFW is case-insensitive for all cases tested. 

Once we establish this assumption, we can very quickly identify the GFW rule given a string (Wikipedia page URL 
in this case) which triggers GFW reset. We will explain the general procedure below. To help with explanation, we use 
the following notations: 

• S is a string. 

• L(S) is the string S with its leftmost character removed. 

• R(S) is the string S with its rightmost character removed. 

• concatfSl, S2) is the concatenation of two strings SI and S2. Note that in order to avoid odd cases where the 
concatenation introduces new sensitive terms, we actually add two whitespaces between SI and S2 here. 

• M(S) is short for concat(L(S), R(S)). 

For example, the string “en.wikipedia.org/wiki/Princelings” ( S ) triggers GFW reset, we then check the strings 
“n.wikipedia.org/wiki/Princelings” ( L(S)) and “en.wikipedia.org/wiki/Princeling” ( R(S )) to see if they trigger reset. If 
neither triggers, we check M(S), if that still does not trigger, then we are certain that the string S is on GFW rulebook. 
In this (quite special) example, both L(S) and R(S) trigger reset, so we need to strip more characters from the beginning 
and the end, and repeat the process. The actual GFW rule turns out to be “.wikipedia.org/wiki/Princeling” ( T). For 
a string T on GFW rulebook that does not involve boolean “and”, the necessary and sufficient condition (under the 
aforementioned assumption) is that T triggers reset but M(T) does not, for example, M(T) is “wikipedia.org/wiki/ 
Princeling .wikipedia.org/wiki/Princelin” here. 

For other cases where S triggers reset, R(S) and L(S) do not but M(S) does, this means GFW is using boolean “and” 
here, i.e. GFW’s rule here is “T1 && T2” where T1 is a substring of R(S) and 72 is a substring of US). In this case, we 
hold R(S) unchanged while repeating the procedure on L(S), until we find a substring of L(S) (denoted as 77), where 
concat(Tl, R(S)) triggers reset, but concat(M(Tl), R(S)) does not. Then we hold 77 and repeat this procedure for R(S), 
until we find a substring of R(S) (denoted as 72), where concatfTl, T2) triggers reset, but concat(Tl, M(T2)) does not. 
Then we recheck concat(M(Tl), T2), if that does not trigger rest, we are certain that the GFW rule is “T1 && T2”. 
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This procedure also applies to “and” rules with more than two parts. For all Wikipedia-related cases, we have not 
observed any rules involving more than two parts. Also, for rules like “T1 && T2”, we examined whether GFW might 
be using regex rule like “T1.*T2”, but that never happens. 

This procedure may seem to require many steps, luckily for majority of cases, the GFW rule is standard, so it can 
be a lot more efficient. For a given string S that triggers reset, we first check M(S). If it is fine, then we are done. 
If it is not, then we suspect it is two strings with boolean “and”. We hypothesize that 77 is the domain part (e.g. 
“zh.wikipedia.org”) and 72 is the page title, so we check concat(M(Tl), T2), concat(Tl, M(T2)) and concatfTl, T2), if 
the first two are fine and the last triggers reset, then our hypothesis holds and we are done. If this is not the case, then 
we need to do more testing. For the entire study, there are less than 20 non-standard cases (i.e. “corner cases” as in 
Table 4.2). 

Note that these testings may give false positives or false negatives. So we need to do such testing multiple times, 
and for the corner cases, we conduct dozens of tests to verify it is indeed a nonstandard rule. Furthermore, after we 
identify a rule, we will also examine its traditional or simplified versions, even when those version do not have a 
Wikipedia page. With all these rigorous tests, the author claim that all rules reported in this document are accurate and 
precise with full confidence. 


4 GFW Rules for Wikipedia 

The keyword-based filtering part of GFW can be viewed as a collection of string matching rules. This collection is 
applied to all HTTP requests that pass through a GFW device. Understandably, some (or more precisely, many) of 
these are not targeting Wikipedia, which is the focus of this study. So we only cover those rules affecting Wikipedia. 
We can group these into four types (Table 4. 1). 


Rule Type 

Affects 

Count 

Example 

String Matching Rule for the Example 

Broad 

(targeting Wikipedia) 

any Wikipedia page whose 
title contains the string 

324 

mm 

"fSIBRf & zh.wikipedia.org 

Prefix 

(targeting Wikipedia) 

any Wikipedia page whose 
title starts with the string 

577 

Charter 08 

zh. wikipedia.org/wiki/ ^ ¥ KISL&Hift 
en.wikipedia.org/wiki/Charter 08 

SELF (Non-URL) 

all internet traffic, 
including Wikipedia 

18 

®$:0iE 

MtSrBiE 

URL 

all internet traffic, 
including Wikipedia 

37 

blogspot.com 

blogspot.com 


Table 4.1: Four types of GFW rules affecting Wikipedia 


• $term & zh . wikipedia . org. In this case, if $term appears anywhere in the Wikipedia page’s title, the 
article is blocked (for that version). We’ll refer to this type as “broad" match. There are 324 rules of this 
type. Although most of such rules affect only one page, some of these affects many more, e.g. IKIUJt 1 & 
zh. wikipedia . org affects 688 pages (284 articles and 404 other pages) and 1989^F 2 & zh . wikipedia . 
org affects 232 pages (105 articles and 127 other pages). 

A notable phenomenon is that there is currently not a single rule like $term & en . wikipedia . org, i.e., all 
broad rules target the Chinese Wikipedia. All but 8 of the broad rules are like $term & zh . wikipedia . org. 
Table 4.2 contains these corner cases where the Wikipedia part is not zh.wikipedia.org and the actual 
terms. 

• zh. wikipedia . org/wiki/ $term or en . wikipedia . org/wiki/ $term. In this case, if the article’s 
title starts with $term, it is blocked. We’ll refer to this type as “prefix". There are 577 rules of this type. For 
most of the time there is only one article that matches the rule when the article’s title equals $term, but there 


'wBRf: Russia, in haditional Chinese. 

2 1989^: Year 1989. This is the year of the Tiananmen Protest, a.k.a, June 4th Movement. 
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are cases when one rule affects many many articles, for example, the rule zh . wikipedia . org/wiki/ 4^ 
affects 18 articles. 

Note that Chinese Wikipedia has 9 variants (see Section 6.2), so the zh . wikipedia . org/wiki/ part could 
be in other form (e.g. zh . wikipedia . org/zh-cn/. There are 4 corner cases where where the wikipedia 
part is not zh . wikipedia . org/$ variant / or en. wikipedia .org/wiki/. See Table 4.2. 

• $term where $term is not a URL. In this case, any HTTP GET request that matches $term will trigger reset, 
for example, when one searches for the term on a search engine across GFW, the connection will be blocked. In 
our case, if a Wikipedia page’s title is $term, or contains $term, it will be blocked. For example, 5£$r 0 iE * 4 
by itself is a GFW rule, so the Wikipedia page http: / / zh. wikipedia . org/wiki /5£$ 0 iE is blocked. 
This type is relatively rare, only 18 of them affect Wikipedia. 

• $term where $term is a URL. These rules are intended for blocking these sites, and there are consequences 
for Wikipedia if Wikipedia has such an article. For example blogspot. com is on GFW rulebook, so the 
Wikipedia page http://en.wikipedia.org/wiki/Blogspot.com is blocked. GFW rulebook contains a large number 
of URLs, and many Wikipedia pages are affected. There are 7 terms that affect a Chinese Wikipedia page (all 
pages considered) and 30 terms that affect an English Wikipedia article (i.e. namespace=0). There are many 
more that affect English non-article pages, which are omitted from this report. 


Rule Type 

String Matching Rule 

Count 

broad 

zh.wikipedia.org & $term 

316 

broad corner cases: 



broad 

wikipedia & $term 

2 

broad 

wikipedia.org & $term 

3 

broad 

zh. wikipedia & $term 

1 

broad 

zh.wikipedia.org/w & $term 

1 

broad 

zh. wikipedia.org/zh-hk & $term 

1 

prefix 

zh.wikipedia.org/wiki/$term 

259 

prefix 

zh.wikipedia.org/zh/$term 

35 

prefix 

zh.wikipedia.org/zh-cn/$term 

85 

prefix 

zh.wikipedia.org/zh-tw/$term 

51 

prefix 

zh.wikipedia.org/zh-hk/$term 

36 

prefix 

zh.wikipedia.org/zh-sg/$term 

26 

prefix 

zh.wikipedia.org/zh-hans/$term 

29 

prefix 

zh.wikipedia.org/zh-hant/$term 

27 

prefix 

en.wikipedia.org/wiki/$term 

20 

prefix corner cases: 



prefix 

wiki/$term 

1 

prefix 

pedia.org/zh-han t/$term 

1 

prefix 

zh-cn/$term 

1 

prefix 

.wikipedia.org/wiki/$term 

1 

prefix 

gan.wikipedia.org/wiki/$term 

1 

prefix 

zh-yue.wikipedia.org/wiki/$term 

4 


9AMM 

7?m, Kit m mmm* 

ififi rfr K A. SS ffl K/EMkmira 
M&W05 P 

5YWt5Zub5LqL5Lu2 


20th_anniversary_Tiananmen_square_incident_march 

User:Liangent-bot/Base64URL/5YWt5Zub5LqL5Lu2 

User:Liangent-bot/Base64URL/5YWt5Zub5LqL5Lu2 

Princeling 

mm&, tab, Ammw, 


SELF (non-URL) 

$term 

18 

SELF(URL) 

$term 

37 


Table 4.2: GFW rulebook summary 


president of Republic of China (a.k.a. Taiwan), in traditional Chinese. 

4 5$/C 0 iE: Diary in Yan’an, a book by Soviet diplomat Peter Vladimirov, which covers the history of Chinese Communist 
Party from 1942 to 1945. Published in English with titles Chinese Special Zone: 1942-1945 and The Vladimirov Diaries. 
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4.1 Encodings 

In China, there are three encoding schemes for Chinese characters commonly in use: UTF-8, GBK, Big5. UTF-8 is the 
international standard. GBK is for simplified Chinese and it is the predominant encoding scheme in mainland China. 
Big5 is for traditional Chinese and is used mainly by Taiwan and Hong Kong websites. GBK and Big5 were developed 
from very early on, and they use 2 bytes to represent each Chinese character, compared to 3 bytes by UTF-8, so they 
are still in wide use. 

Wikipedia seems to only support UTF-8, so the main testing in this study is in UTF-8. However, I did study the 
other two encoding schemes extensively and found that GFW also has GBK/Big5 on their rulebook. Each rule is 
actually three rules, except for cases where there is no applicable GBK or Big5 encoding. 

For example, zh.wikipedia.org & is on GFW rulebook, there are three string matching rules deployed 

to the GFW devices: 

• zh.wikipedia.org & %E6%B1%AA%E6%B4%8B (UTF-8) 

• zh.wikipedia.org & %CD%F4%D1%F3 (GBK) 

• zh . wikipedia . org & %A8%4C%AC%7 6 (Big5) 

4.2 The 64-byte Limit 

This study also revealed that GFW’s string matching rules have a hard 64-byte limit. All rules that are longer than 
64 bytes are truncated to 64 bytes at deployment. This is a supporting evidence that GFW devices are custom made 
hardware, built to run string matching highly efficiently. 

The first case that came to my attention is the page “zh.wikipedia.org/wiki/2013^|^^J 
^6” -phis string triggers GFW reset, but it turned out that the last character is not required. Counting bytes show 
that the full string is 68 bytes, removing the last character reduces it to 65. In fact, we test the byte strings and we see 
that the first 64 bytes are the culprit, e.g., the 63-byte string “zh.wikipedia.org/wiki/2013%E5%B9%B4%E5%8D%97 
%E6%96%B9%E5%91%A8%E6%9C%AB%E6%96%B0%E5%B9%B4%E7%8C%AE%E8%AF%8D%E8%A2%AB 
%E5%88%A0%E6%94%B9%E4’’ does not trigger reset, but the 64-byte string “zh.wikipedia.org/wiki/2013%E5%B9 
%B4%E5%8D%97%E6%96%B9%E5%91%A8%E6%9C%AB%E6%96%B0%E5%B9%B4%E7%8C%AE%E8%AF 
%8D%E8%A2%AB%E5%88%A0%E6%94%B9%E4%BA” does, where the last two bytes “%E4%BA” are the first 2 
bytes of the UTF-8 code for the character $ (“%E4%BA%8B”). 

Furthermore, we examine the GBK encoding of this string. GBK encoding represents each Chinese character 
by two bytes, so the full string is within the 64-byte limit. Indeed, the full string is required to trigger GFW reset: 
“zh.wikipedia.org/wiki/2013%C4%EA%C4%CF%B7%BD%D6%DC%C4%A9%D0%C2%C4%EA%CF%D7%B4 
%CA%B1%BB%C9%BE%B8%C4%CA%C2%BC%FE”. If we remove the last character or the last byte then it won’t 
trigger. 

Combining these data points, we can see that when GFW operator receives an order, it considers all three encodings, 
and if any exceeds 64 bytes, it’ll truncate that one at 64 bytes. 

There are actually quite a few such cases in GFW rulebook. Combining the byte probing and examination of 
GBK/Big5 probing, we can identify the actual order that the GFW operator received. See Table 4.3. 


5 GFW’s HTTP Response Scan: Methodology and Rulebook 

5.1 Methodology (HTTP Response Scan) 

We use a combination of the following to study GFW’s HTTP response filtering. 

• cURL is a computer software project for transferring data on the web. In particular, we use the command line 
tool curl to fetch web content. For this research, the most desirable feature of curl is the -r flag, which specifies 
a byte range of the page content for transferring. Wikipedia content is static, so repeated curl commands with 


5 S.W: Wang Yang, Vice Premier of China, member of the 17th and 18th Politburo. 

6 20 2013 Southern Weekly Incident, a conflict between the Southern Weekly editors and 
the Party propaganda branch. 
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The 64-byte Limit: Triggering String and the Actual Order 


Triggering string (UTF-8) 
Actual order 

zh.wikipedia.org/wiki/2013^ 

zh. wikipedia.org/wiki/2013^ #7} MtLSL 

Triggering string (UTF-8) 
Actual order 

zh.wikipedia.org/zh-tw/2013^ (CSTj MtL)) f'JffifJ 

zh.wikipedia.org/zh-tw/ 20 i 3 JL ((S7j MtL)) 

Triggering string (UTF-8) 
Actual order 

zh.wikipedia.org/zh-hans/jtijt4 ] PcfsJi§tlfc; fe JT^ ; f¥Pft7A 
zh. wikipedia.org/zh-hans/jtJjt ^ Rfl! Jjf Jf SiWPS 

Triggering string (UTF-8) 
Actual order 

zh.wikipedia.org/wiki/Wikipedia:)t!lP9K®fniW3}</2007JL6^ 
zh.wikipedia.org/wiki/Wikipedia^JP/Jt&irlfnilf^^OOVJLfiLlT 0 

Triggering string (UTF-8) 
Actual order 

zh.wikipedia.org/wiki/Wikipedia: S |(J PfjlffiJL/??S/2010^7- 
zh.wikipedia.org/wiki/Wikipedia: S |(J ffi fL/t^@/2010^7 -12 Li 

Triggering string (UTF-8) 
Actual order 

zh.wikipedia.org/wiki/Talk:2013JL 

zh.wikipedia.org/wiki/Talk:2013JL ffiLf/S tL)) 

Triggering string (UTF-8) 
Actual order 

zh.wikipedia. 0 rg/zh-hant/Taik: 2013 JL 9fJL®i 

zh.wikipedia.org/zh-hant/Talk:2013^J ((ffijjMTL)) 

Triggering string (UTF-8) 
Actual order 

zh.wikipedia. 0 rg/wiki/Taik: 2013 JL ((iSLjJMtL)) SlfJLWLJ 
zh.wikipedia.Org/wiki/Talk:2013JL ffiTi MItL)) 


Table 4.3: Long strings that are truncated due to the 64-byte limit. 


refined byte ranges can pinpoint the exact location of the page content that offends GFW. Note that for each curl 
test, we need to run a “follow up” innocent curl request to examine if the connection has been reset or not. 

• Custom web server. We set up a simple Apache web server so that we can freely tweak the content for transfer¬ 
ring. This is indispensible to confirm the exact string matching rules. 

• Editing Wikipedia content. This is necessary because we found that a subset of GFW’s HTTP response filtering 
rules only affect the Wikipedia site, not my custom web server, or other websites. Since we do not want to 
pollute Wikipedia content, this is done in user testing pages. 

• HTTP proxy. Probing via a proxy in China can give us some information about user experience if the HTTP 
request and response are between that proxy IP and a target server IP. Using multiple proxies can also increase 
our probing efficiency. A web search gives many public proxies, for example, the list at http://spys.ru/free-proxy- 
list/cn/. These proxies can help to reveal topology of GFW operations. We found that different proxies may give 
different results when fetching offending content from non-offending URLs, which suggests that GFW’s HTTP 
response filtering is more distributed, and likely deployed at multiple locations at ISP level. On the contrary, 
for HTTP request filtering, probing offending or non-offending URLs, all give the same results for the many 
different proxies we tested (allowing for a small amount of noise in testing). This suggests that GFW’s HTTP 
request filtering most likely happens at the border in a controlled and integrated manner. 

Regarding the candidate “offending” content for probing, we use two sources. 

• Greatfire.org’s list of blocked Wikipedia pages. Based on my study of GFW’s rulebook for HTTP request 
filtering, we know the URLs for some of those unaccessible pages do not trigger GFW reset, so we suspect that 
the page content is the culprit. We run curl for these pages, and in an iterative fashion, locate the exact byte 
positions in the content that trigger GFW reset. Appendix A is the summary of the diagnosis of Greatfire.org’s 
Wikipedia list. The full list and diagnosis is available at http://goo.gl/jUJpHb. 

• Our previous study of GFW’s HTTP request filtering gives an extensive list of sensitive terms. We put these 
terms, together with many derived terms in pages on our custom web server, and also on a Wikipedia test page. 
Then we test these pages (from many proxy IPs) to see if they trigger GFW reset. We also throw in various 
HTML content, for example, those of search engine result pages, to our testing pages. 
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Note that a page may contain multiple phrases that can trigger GFW, also some rules require two parts (i.e. 
“$term_A & $term_B”) or even three. A lot of care is needed to make precise and conclusive claims. Adding to 
the difficulty is that different (proxy) IPs may be subjecting to different rulebook and the resetting rates vary a lot 
among these (proxy) IPs and they are all much lower than 100%. The testing effort is significantly bigger than the 
study of GFW’s HTTP request filtering, per rule. Luckily, the response filtering rulebook is two orders of magnitude 
smaller, so we are able to get an accurate picture of this messy operation. 

5.2 GFW’s HTTP Response Filtering Rulebook 

In our study, we are able to identify 19 rules that GFW applies to HTTP responses. See Table 5.1. These terms can 
be seen as the ultimate sensitive topics China authority wants to block. Note that these rules are fully vetted, i.e., 
substrings or variations not listed here will not trigger GFW. Also, even though we can not claim this list is complete, 
there should not be any obvious omissions, because we tested numerous “sensitive” content. For example, there is 
no Tiananmen Incident-related term here. We have examined many pages (on or off Wikipedia site) on this topic and 
confirmed it. 


A: Rules targeting Wikipedia only 

falun * 

as in Falun Gong, the most well known GFW term 

flg * 

acronym for Falun Gong 

wm 

Falungong 


Dalai Lama 

gSffiFi 

Free Tibet 

SUM 

Tibetan independence 


Tibetan government in exile 


snow mountain and lion flag 

B: Rules patially deployed 

& TOR 

Falun Dafa & Minghui web 


New Tang Dynasty TV station & Shen Yun performing arts group 


second stage & CCP Central Guard Bureau 

& WM * 

Hu Haifeng & Nuctech (in traditional Chinese) 

C: Rules with (near-)complete deployment 

M 

Namibia & Nuctech (in simplified Chinese) 


Namibia & Hu Haifeng 

namibia & nuctech 


namibia & huhaifeng 


dongtai wang. com 

a website hosting Freegate 

MfnWM & gfw * 

burn after reading & gfw 

^& google.com & images/nav Jogo * 



Table 5.1: GFW rulebook for HTTP response filtering. Terms with * are explained in more details. 

We examined these terms on Wikipedia pages and on my custom web server and probed from hundreds of proxies 
in China. They can be grouped into the following three types. 

• 8 rules only affect Wikipedia.org. 3 are about Falungong and the other 5 are about Tibet. A few highlights are: 

- “falun” is probably the most well known GFW term. There are interesting anecdotes about the Sweden 
city Falun. There were evidences that this string was applied to all internet traffic by GFW in the past, but 
currently non-Wikipedian pages that contain this string seem to no longer trigger GFW reset. I tested this 
on my custom web server, and also probing various pages about the Sweden city Falun from a large number 
of proxy IPs in China. 
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- “fig” is the acronym of Falungong. This string is very short so it causes numerous unintended dam¬ 

ages. For example, many of those seemingly innocent Wikipedia articles reported as blocked by Great- 
fire.org are due to this tiny string, including articles for “Graphene”, “Same-sex_marriage”, “homosexu¬ 
ality”, “Human_rights”, “Comparison_of _tablet_computers”, “HSla” 7 8 , “ 0 “r'4'i®” 9 and many 

more. This is particularly an issue because Wikipedia articles usually contain web links which contain 
the forward slash sign (“/”) which is escaped as “%2F”, e.g., the link www.barackobama.com/pdf/lgbt.pdf 
is www.barackobama.com%2Fpdf%2Flgbt.pdf in the HTML source for the article “Same-sex 
_marriage”, making the page unaccessible from China. 

- The term (Dalai Lama) shows up in the template {{Heft $$%&}} (Tibetan Buddhism), and 

this template exists on many Tibet-related articles. The content of this template is hidden by default, but 
the term exists in the page’s HTML source. This effectively makes the majority of Tibet-related pages 
unaccessible in China, even though most of these pages have nothing to do with Dalai Lama. 

• 4 rules which are partially deployed. They trigger GFW reset only from a subset of proxy IPs, notably from 
the CERNET (China Education and Research Network) but not limited to CERNET. Note that if these terms 
are on Wikipedia.org, they trigger GFW reset from all proxy IPs I tested. Two terms in this category are about 
Falungong, the other two are noteworthy: 

- “■§!} & HSj.fE”. The first part is Hu Haifeng, son of Hu Jintao and the second part is a company Nuctech 

in traditional Chinese. This is due to a scandal regarding alleged bribery by Nuctech to Namibia offi¬ 
cials. There are four rules in the “(near-)complete deployment” category about this scandal, two in En¬ 
glish and two in simplified Chinese. I examined this thoroughly, other combinations of terms and tra¬ 
ditional/simplified versions are fine. My hypothesis is that the Chinese censors did study web content 
carefully. First they found that web pages without or “Namibia” are “clean”, so just deployed 

the four rules, but later they discovered that some pages with and “S5IE” in the traditional form 

are also “bad”, so this rule was deployed separately, and deployed to a different set of hardware. 

- & ^^irUjlj”. The first part is “second stage” and the second part is “CCP Central Guard 
Bureau”. This is a very odd case but it is fully vetted. This rule was discovered from examining the 
Wikipedia article http://zh.wikipedia.0rg/wiki/4 1 (“4 II 44” is the short name for Chinese Communist 
Party) which is reported by Greatfire.org as blocked. There is web content on the internet about somebody 
who claims to be from CCP Central Guard Bureau and leaks “secrets” of top Party officials, but I do not 
know whether this rule was motivated by that content or this Wikipedia article. This arcane case suggests 
that there might be more of such odd cases that my current investigation did not catch. 

• 7 rules which are deployed to cover all Chinese users. For almost all the proxy IPs I tested, these terms trigger 
GFW reset. Four of these are about the aforementioned Hu Haifeng-Nuctech-Namibia scandal, one is a website 
which hosts the GFW-circumvention tool Freegate, the other two are noteworthy: 

- & gfw”. The first part is “burn after reading”. This rule was discovered from examining 

the Wikipedia article http://zh.wikipedia.0rg/wiki/4 1 SI There is a widely 

circulated article online with the title “iMIinSP^- GFW&tl^Itttt^^E” (“Burn after reading - the past life 
and present form of GFW”) since 2009. The author is anonymous but the article contains unprecedented 
details of GFW operations. This rule shows the extreme sensitivity of this article. 

- & google.com & images/navJogo”. The first part is Li Changchun, China’s Propaganda Chief 
from 2002 to 2012. This rule was discovered when we put Google search result page’s content together 
with all the known sensitive terms in a probing target. This rule is significant in that it is specifically devised 
to target Google search pages which contain the name “Li Changchun”. The third part “images/navJogo” 
is put in place to ensure it is a Google search result page. Li Changchun is the only Party officials that 
receives this “treatment”. 


7 It a : Mo Yan, Chinese author who won the Nobel Prize in Literature in 2012. 

8 0 Japanese geography. 

Cold War. 

10 4 I |S||Wl^§®44il?{ii^@^^ l JS: List of keywords filtered by China’s internet censorship software. 
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5.3 Learnings and Mysteries of GFW’s HTTP Response Filtering 

Unlike the HTTP request filtering, it is very hard to study GFW’s exact mechanism to scan HTTP responses, because it 
is fairly straightforward to construct various HTTP requests for probing, but much harder to construct HTTP responses 
at will. In particular, I have not been able to find a way to forge the source IP address in the IP packet header. Therefore, 
even though we have obtained a quite comprehensive list of filtering rules for GFW’s HTTP response scan in this study, 
we have obtained only some basic understanding on GFW’s practice of HTTP response scanning. 

• GFW’s HTTP request filtering and response filtering are two separate systems. For one, their filtering rules are 
entirely different. For two, GFW’s HTTP request filtering is homogeneous and has near perfect trigger rate, but 
GFW’s HTTP response filtering varies hugely, not only in the triggering rates, but also in the filtering rules in 
effect. For example, CERNET (Chinese Education and Research Network) seems to have all the rules in place, 
but some other ISPs only have a subset. The heterogeneity in filtering rules, i.e. different ISPs are subject to 
different filtering rules, actually provides an opportunity to study GFW topology. This could be a future research 
topic. 

• One remarkable finding is that GFW does not just look at individual TCP packet, but instead, it “remembers” the 
entire TCP session to look for offenders. This becomes evident when the filtering rule is “$term_A & $term_B”, 
and the two terms show up far apart (hundreds of thousands bytes from each other) on a webpage, GFW will 
still be able to reset the connection. To achieve this requires significant investment in infrastructure, and it is 
probably also the reason why the rulebook is so much smaller for HTTP response filtering than HTTP request 
filtering. 

• Given the list, I also studied the GBK and Big5 encodings for each of the rules. 13 terms have proper GBK 
encoding and 1 term has proper Big5 encoding. Indeed, when the UTF-8 version triggers GFW reset, the 
corresponding GBK or Big5 version will also. This is consistent with GFW’s HTTP request scan, in which GBK 
and Big5 encodings are filtered as well. 

• We studied upper cases and lower cases of these rules. GFW’s HTTP response filtering is case insensitive, which 
is consistent with GFW’s HTTP request filtering. 

Despite these learnings, we have a major unknown issue and a few puzzling mysteries. 

• We don’t know exactly how GFW specifically targets Wikipedia site for these rules. I tried putting those Type 
A terms in a page on the custom web server but that does not trigger GFW reset. I also tried adding the strings 
“zh.wikipedia.org”, “en.wikipedia.org”, and Wikipedia.org IP addresses (in digits and in byte format) to the 
page, still no resets. My hypothesis is that GFW looks at the source IP address field in the IP packet header to 
look for the Wikipedia.org IP, but we are unable to confirm this. Interested readers and researchers are welcome 
to study this more deeply. 

• A great mystery is the “masking effect” of the term “lift:” (falun). The phenomenon is the following, we fetch 

a (non-Wikipedia) page which contains an offending text, it will trigger GFW reset as expected, however, if we 
add the term to that page, fetching the page will no longer trigger GFW reset. It’s also the case when we 

specify a byte range with curl, i.e. if that byte range contains an offending string and the term “'(£?£”, it will not 
trigger GFW reset. This phenomenon is reproducible for all the rules (except for the rule “'Sikh’S; & 

|M|” which contains itself) on a non-Wikipedia page from many (but not all) proxy IPs we tested. It is as 

if that the existence of makes the GFW filtering device go into a different mode. My speculation is that 

GFW may go into a surveillance mode when it sees “lift:”, hence skipping the resetting, either intentionally or 
due to a bug. Interested readers and researchers are welcome to study this more deeply. 

• One other mystery is about the term “fig” on Wikipedia.org. This string on Wikipedia.org site does trigger GFW 
reset. However, if we run curl to only fetch these three bytes (by the “-r” flag), it will not trigger GFW reset. If 
we include something extra by making the byte range more than 3, then it will trigger GFW reset. This extra 
thing can be anything - some arbitrary characters or whitespace or punctuations. Other longer rules (e.g. the 
five-bytes rule “falun”) do not have this property. I do not know the reason for this phenomenon. It could be a 
result of GFW special mechanism for string matching. 
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6 Wikipedia Specifics 

6.1 Wikipedia Content Structure 

When people talk about Wikipedia, they often just mean the articles (or, “real” content). These are the pages a normal 
Wikipedia visitor reads. But underneath that, Wikipedia actually hosts a lot more types of content. These types are 
called namespace. Table 6.1 has more details, based on information from http://www.mediawiki.org/wiki/Manual: 
Namespace. 


Index & Name 

Purpose 

Counts (zh|en) 

URL 

0: (Main) 

1: (Main) talk 

“Real” content; articles 

1,177,122 | 10,457,990 
172,707 | 5,215,221 

[en| zh] .wikipedia.org/wiki/$page_title 
[en|zh].wikipedia.org/wiki/Talk:$page Title 

2: User 

3: User talk 

User pages 

58,497 | 1,741,811 
562,241 | 9,061,249 

[en|zh].wikipedia.org/wiki/User:$page_title 
[en| zh]. wikipedia.org/wiki/User _talk:$page Title 

4: Project 

5: Project talk 

Information about the wiki 

27,953 | 793,686 
4,097 | 210,611 

[en | zh]. wikipedia. org/wiki/Wikipedia: Spage Title 
[en|zh].wikipedia.org/wiki/WikipediaTalk:$pageTitle 

6 : File 

7: File talk 

Media description pages 

35,583 | 843,439 
4,162 | 176,685 

[en | zh]. wikipedia. org/wiki/File: Spage Title 
[en| zh] .wikipedia.org/wiki/File_talk:$page Title 

8 : MediaWiki 

9: Mediawiki talk 

Site interface customization 

6,846 | 1,895 

299 | 1,081 

www.mediawiki.org/wiki/MediaWiki:$page Title 
www.mediawiki.org/wiki/MediaWiki Talk: Spage Title 

10: Template 

11: Template talk 

Template pages 

861,514 | 524,470 
6,226 | 204,302 

[en | zh] .wikipedia. org/wiki/Template: Spage Title 
[en| zh]. wikipedia.org/wiki/Template Talk:$page_ti tie 

12: Help 

13: Help talk 

Help pages 

303 | 1,327 

66 | 628 

[en | zh]. wikipedia. org/wiki/Help: Spage Title 
[en | zh]. wikipedia. org/wiki/Help Talk: Spage Title 

14: Category 

15: Category talk 

Category description pages 

153,155 | 1,053,966 
14,739 | 677,383 

[en | zh]. wikipedia. org/wiki/Category: Spage Title 
[en| zh]. wikipedia.org/wiki/Category _talk:$page Title 

others 

custom namespaces 

6,946 | 160,434 



Table 6.1: Wikipedia namespace; counts are based on the 2013-09-08 dump (zh) and the 2013-09-04 dump (en). 

In the beginning, I was considering only to examine the articles (i.e. namespace=0). Then an interesting case 
came to my attention. I saw that Greatfire.org’s Wikipedia tracking page shows that the page http://zh.wikipedia.org/ 
wild/Wikipedia: H ® IS n't cil 11 is blocked. I verified that it is indeed the case. This page is in the Project 
namespace, so I expanded my examination from Articles only to all applicable namespaces. To be precise, for the 
Chinese version, we tested all entries except for MediaWiki (namespace=8), MediaWiki talk (namespace=9) and those 
“customized” namespaces (namespace >=100). For the English version, we tested entries in Article (namespace=0). 
User (namespace=2). Project (namespace=4), File (namespace=6). Template (namespace=10). Help (namespace=12). 
Category (namespace=14), i.e., we ignored those “Talk” namespaces for English. 

Indeed, we found 7 GFW rules against Project, 25 rules against User, 2 rules against Category, 2 rules 
against File, and 23 rules against Talk. 

Also, Wikipedia has smart localization schemes for these namespaces. For example, to access the User page 
for the Chinese version, one can use the English URL: “zh.wikipedia.org/wiki/User:$user”, or the localized URL: 
“zh.wikipedia.org/wiki/iTlPAuser”. The latter actually redirects to the former (using HTTP 301 redirect). I examined 
the localized version of the 25 rules against User, none of them are on GFW rulebook, so I decided to ignore these 
localized URLs. 


6.2 Nine Variants of the Chinese Wikipedia 

Usually we think Chinese Wikipedia as “one version”, but actually Wikipedia has two “main variants” and five “fine 
variants” of Chinese (Table 6.2). The two main variants are simplified Chinese and traditional Chinese. These two 

11 M ffi'fFiSfl’fiffl : discussion for articles deletion. Its English counterpart is en.wikipedia.org/wiki/Wikipedia: Articles_for_deletion 
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are deprecated, but their legacy URLs still work. The five current finer variants are: PS fa] ^ (Mainland Simpli¬ 
fied), o' IE IS (Taiwan Traditional), (Hong Kong Traditional), IMPmiS (Macau Traditional), 3;§ffflo 

(Malaysia/Singapore Simplified). Currently Wikipedia uses the canonical URL: zh . wikipedia . org/wiki/$term 
and the rendering follows user’s preference setting, and the user can manually select a version from a pull-down menu, 
which overrides the default setting, and with a distinctive URL like zh. wikipedia . org/zh-cn/$term. Fur¬ 
thermore, there is a general URL like zh . wikipedia . org/ zh/ $term which renders just like the canonical URL 
zh. wikipedia . org/wiki/$term. So the total number of valid URL patterns that we need to test is 9. 


Variant 

URL 

GFW Rule Counts 

Canonical 

zh.wikipedia.org/wiki/$term 

259 

Equivalent alias 

zh. wikipedia.org/zh/$term 

35 

1$ (Mainland Simplified) 

zh.wikipedia.org/zh-cn/$term 

85 

a/^IEfS (Taiwan Traditional) 

zh.wikipedia.org/zh-tw/$term 

51 

Hr/Smi (Hong Kong Traditional) 

zh.wikipedia.org/zh-hk/$term 

36 

HPlUti (Macau Traditional) 

zh.wikipedia.org/zh-mo/$term 

0 

(Malaysia/Singapore Simplified) 

zh.wikipedia.org/zh-sg/$term 

26 

fp\W (Simplified) 

zh.wikipedia.org/zh-hans/$term 

29 

lEtllftif (Traditional) 

zh. wikipedia.org/zh-han t/$term 

27 


Table 6.2: Variants of Chinese Wikipedia. Rule counts exclude those few corners cases. 

To the GFW devices, these are all different URLs, so a rule like zh. wikipedia. org/wiki/P/JtTfcM 12 will 
not be able to block a user request like zh. wikipedia. org/zh-cn/^ 7 tM. However, the broad rule like 
zh. wikipedia . org & ^lj can block all these variants since their URLs all contain the string zh. wikipedia 
org. GFW uses broad rules for many terms, but it does use prefix rules for many other terms, covering only a subset 
of these variants. This may seem puzzling to an outside observer, but there are two possible explanations for this phe¬ 
nomenon. Firstly, the current “incomplete” block does its job well, because majority of Wikipedia visits are using the 
canonical path, and GFW is never assumed to be watertight. Secondly, it is possible that GFW operators simply accept 
“orders” from their supervising authorities, i.e. the censorship officials. These officials may simply order: “block ^ijB^ 
on Wikipedia”, for which case the GFW operators will use the broad rule; or, they may pass on one URL or a list of 
URLs for GFW to block, e.g. zh . wikipedia . org/zh-cn/^ 7 tj) 6 ij, in which case the GFW operators would just 
deploy this string. 

We tested all 9 variants thoroughly in this study. See Table 6.2 for the counts for rules targeting each variant. There 
are 259 rules for the canonical path. For the eight non-canonical variants, there are 289 rules, covering 161 unique 
terms. It is noteworthy that 78 out of these 161 terms only have one or more non-canonical variants blocked, but not 
the canonical one. In my view, this is a strong indication that GFW operators and/or their supervising authority have 
poor understand of Wikipedia. 

A little more history about these variants. According to http://en.wikipedia.org/wiki/Chinese_wikipedia, in the very 
beginning, the Simplified and Traditional Chinese Wikipedias are entirely separate. But the content between the two 
is inevitably similar, so in 2005, it was decided to merge the two and create a unified Chinese Wikipedia. This effort 
is quite a success. It handles the simplified-traditional conversion seamlessly, and also handles different vocabularies 
among the five regions. 

6.3 Wikipedia’s Handling of Traditional and Simplified Chinese Characters 

The Chinese language has mainly two writing systems: Traditional Chinese and Simplified Chinese. It is usually a 1- 
to-1 correspondence for the common characters, but there are many cases that multiple traditional Chinese characters 
are mapped to the same simplified character. 


Chen Guangcheng, the legendary blind civil rights activist, whose escape from house arrest in Apr 2012 raised 
international attention. 

Liu Xiaobo, the 2010 Nobel Peace Prize laureate. 
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In the digital world, a traditional character and its simplified counterpart are just two different objects, and repre¬ 
sented by entirely different encodings. For example, “Wk”, the traditional character’s UTF-8 encoding is %E6%BC%A2, 
and its simplified version “tX”, has UTF-8 code of %E6%B1%89. 

For GFW, blocking one version does not block the other version. In fact, we see numerous cases where one is 
blocked while the other is not. For a target string, GFW operators have to take the effort to find and deploy both 
versions in order to block both the traditional and simplified versions. 

In the Wikipedia land, the Wikipedians take a smart and ambitious approach - automatic traditional-simplified 
conversion. This is a rather nontrivial task, because there are a lot of nitty gritty details in the conversion, luckily 
Wikipedia has enough volunteers and the conversion system is mostly correct and seamless, despite a small amount of 
inevitable complexities and confusion, which are inevitable. The page zh.wikipedia.org/wiki/Wikipedia: 14 
has more details. 

This automatic conversion has implications to GFW blockage. We cover this aspect in Section 6.4. 

6.4 Wikipedia’s Two Types of Redirects 

There are two types of redirects in Wikipedia: the soft redirect and the hard redirect. The former handles the redirection 
behind the scene without further request from the browser. The latter involves HTTP 301 response code (301 Moved 
Permanently), which directs the browser to send another HTTP request for the new target. 

This has implication regarding GFW blockage. Let’s consider the case where a page (A) redirects to another page 
(B), page A is not on GFW blacklist but page B is. If this redirect is a soft one, then the user can view page A without 
interruption, but if this redirect is a hard one, then the user’s connection to Wikipedia will be reset. 

Soft redirects are set up in the article source using the #REDIRECT keyword. For example, the source of the En¬ 
glish article “Tiananmen_Square_Massacre” is simply #REDIRECT [ [Tiananmen Square protests of 1989]], 
so a page request for http://en.wikipedia.org/wiki/Tiananmen_SquareJVIassacre would render the content of the page 
http://en.wikipedia.org/wiki/Tiananmen_Square_protests_of_1989. This redirection happens on the server side. 

Soft redirects are also used for many pairs of page titles in traditional and simplified Chinese. Prior to the 
traditional-simplified unification, for many subjects, two articles were created, one with title in traditional Chinese 
and one with title in simplified Chinese. They could have entirely separate content, or one could be a redirect to the 
other. During the unification effort, all these cases were merged to the same content with one article redirecting to 
the other. For example, Chinese Wikipedia currently has both h t tp: //zh. w i kip e d i a. org/w i ki/ ti fc XF ^ 15 (simplified 
Chinese) and h 11 p: // z h. w i k i p c d i a. o r g/ w i k i / J t ijV/R 16 (traditional Chinese), and the latter soft-redirects to the former. 

However, for many newer pages, e.g. those created after the unification, only one form exists in Wikipedia, a page 
request to the other form is handled by a hard redirect (i.e. HTTP 301 redirect). For example, Chinese Wikipedia has 
an article for i^^zR 17 (simplified Chinese), but it does not have an article for the traditional form jFFluGR. 18 . A page 
request to http://zh.wikipedia.org/wiki/rF/S 7 R. is hard redirected to http://zh.wikipedia.org/wiki/iFF/S 7 R. 

Hard redirects are also used for trivial typological conversions. For example, http://en.wikipedia.org/wiki/Crown%20 
Prince%20Party (%20 is whitespace) is hard redirected to http://en.wikipedia.org/wiki/Crown Prince Party, (for what 
it is worth, the latter page is actually further soft-redirected to http://en.wikipedia.org/wiki/Princelings). Nonstandard 
upper/lower casing in system keywords also involves hard redirect, for example, http://en.wikipedia.org/wiki/TALK: 
Princelings is hard redirected to the standard form http://en.wikipedia. 0 rg/wiki/Talk:Princelings. Also note that page 
titles are case sensitive except for the first character. A nonstandard case for the first character gives a hard redirect, 
e.g. http://en.wikipedia.org/wiki/princelings is hard redirected to http://en.wikipedia.org/wiki/Princelings, while http:// 
en.wikipedia.org/wiki/PRincelings will give a “Wikipedia does not have an article with this exact name’’ page. 

To investigate which redirect is happening is straightforward. For soft redirect, one can see a small subtitle like 
(Redirected from ) (English) or (S/X|n]§ ) (simplified Chinese) under the title line of the main article. For 
hard redirect, one can open the developer tools console in the browser and check the requests the browser has made. 

If we see the original page request gets a status code of 301, and a new page request is made, then that tells a hard 
redirect has happened. 


I4 ;HftiRhfjl: Handling of traditional and simplified Chinese. 

15 Peking University, in simplified Chinese. 

16 JfciiC^P: Peking University, in traditional Chinese. 
'’■FPIStK: Xu Zhiyong, in simplified Chinese. 

18 fFlSzK: Xu Zhiyong, in traditional Chinese. 
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6.5 Other Chinese-related Wikipedias 

Our focus in this study is the Chinese and English versions of Wikipedia, but actually there are 7 more Wikipedias that 
are related to Chinese, these are, roughly, “dialects” of the Chinese language. See Table 6.3. 

These Wikipedias are much smaller and less popular than the Chinese one, so I did not test them exhaustively. But 
still, I tested all the GFW terms that affect Chinese and English Wikipedias on these versions. Indeed, we found that 
GFW does target the Cantonese and Gan Wikipedias for a few terms, but not the other versions. 

The four terms for which the Cantonese Wikipedia is targeted are: T'?® 19 , PJBUi® 20 , /nRSIPW 21 , and 
jjl^ 22 . -phe one term f Qr Q an Wikipedia is targeted is IS "S'III H 2 2 . The connection here is that the Pangu 

band was originated from the Jiangxi Province, where the Gan dialect is dominant. 


Versions 

URL 

GFW Rule Counts 

Classical Chinese Wikipedia 

zh-classical.wikipedia.org/wiki/$term 

0 

Minnan Wikipedia 

zh-min-nan.wikipedia.org/wiki/$term 

0 

Cantonese Wikipedia 

zh-yue. wikipedia.org/wiki/$term 

4 

Mindong Wikipedia 

cdo.wikipedia.org/wiki/$term 

0 

Wu Wikipedia 

wuu.wikipedia.org/wiki/$term 

0 

Hakka Wikipedia 

hak.wikipedia.org/wiki/$term 

0 

Gan Wikipedia 

gan. wikipedia. org/wiki/$term 

1 


Table 6.3: 7 Wikipedias related to Chinese 


7 Caveats 

The author strongly urges the reader to read this section carefully, in order to get the most accurate understanding of 
our findings. 

7.1 On Completeness 

Even though the author uses the word “complete” in the title of this report, the intention here is that our list covers 
all GFW string matching rules that affect the current Chinese and English Wikipedias in its HTTP request filtering, 
this is the right way to interpret the word “complete”. I believe that not all terms on the GFW rulebook targeting the 
Chinese/English Wikipedias are identified, for the following scenarios: 

• GFW may contain Wikipedia-related rules that do not affect any actual Wikipedia pages. This could be that the 
page never existed, or the page has been deleted. Since in this study we are only examining pages in the current 
snapshot (2013-09-08), we may miss those non-existent or deleted pages. 

• GFW’s blacklist is a mess. It is highly likely that certain rules are redundant, i.e. one rule may be strictly 
dominated by another rule. For example, we know “.wikipedia.org/Princeling” is one rule, if GFW also has 
“en.wikipedia.org/Princeling” or “.wikipedia.org/Princelings” on its rulebook, there is no way for us to verify 
that. What we report in this study is fully vetted, and should cover all verifiable rules. 

• As we have mentioned in Section 4, GFW rulebook contains many URLs which may affect Wikipedia. Only 
those URL terms which affect pages in the Chinese Wikipedia or articles (i.e. namespace==l) in the English 
Wikipedia are included in this report. These URL terms are covered in Table 8.11. 


19 T~P;R: Ding Zilin, organizer of the Tiananmen Mothers. 

20 l'JSiS: Liu Xiaobo, in traditional Chinese. 

21 7sE0tp-'rf 1 : June 4th Incident, a.k.a. the Tiananmen Protest in 1989. 

Tiananmen Incident, in traditional Chinese 

23 SiS^|Sl: Pangu. named as “Punk God”, a Chinese avant-garde punk band banned in China, in traditional Chinese. 
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One more related note, our list does include certain rules that do not affect actual Wikipedia titles. This is because 
after we have identified a term, we will also investigate its various forms, e.g. its traditional/simplified counterpart, 
other variants and versions of Wikipedia. Probing these other forms may surface more GFW rules that target Wikipedia. 
But the pages that these rules target may not actually exist. Some of them are (hard) redirected to an actual Wikipedia 
page, some may give the “Wikipedia does not have an article with this exact name” page. As long as GFW has such a 
rule on its rulebook, it is included in this report. 

In version 1.0, we did not study GFW’s HTTP response filtering, so we cautioned readers that our rulebook did 
not capture all Wikipedia pages that are not accessible from China. In this version, we studied GFW’s HTTP response 
filtering extensively and identified a small but impactful list of filtering rules. If a Wikipedia page’s HTML source 
offends any of these rules, China users will not be able to access that page successfully. We are reasonably sure that 
we have captured all GFW’s HTTP response filtering rules that affect Wikipedia, but we skipped compiling the list of 
affected pages. This is because 1, Wikipedia content is in flux; 2, to examine the HTML source of all Wikipedia pages 
is a daunting or infeasible task; and 3, unlike the HTTP request filtering list, the list of affected pages due to HTTP 
request filtering do not give much more information about China censor’s motivation or priorities. 

If the reader encounters a blocked Wikipedia page, he can first check the complete rulebook in Section 8. If the 
page’s URL does not offend any rule there, he can try to get the HTML source and search for the rules in Table 5.1. 
Also keep in mind of the following potential issues: 

• A page whose URL does not offend GFW rulebook might be unaccessible if it is “hard” redirected (see Section 
6.4) to an offending page. 

• Due to network topology, different ISPs/regions may connect to Wikipedia by different paths, some of which 
might not be affected by GFW interference. 

• GFW does not operate at 100%. Certain level of randomness does exist. 

Anyway, if a Wikipedia page is unaccessible, its URL does not offend the rulebook in Section ?? and its HTML 
source does not offend any rules in Table 5.1, the author welcomes such data points (by email) and will investigate. 

7.2 Possible Reaction by GFW 

This report will likely embarrass some GFW operators (if they ever see it), not only in revealing these supposed 
“secrets”, but also in revealing the porousness and lack of maintenance of the GFW rulebook. In a way, this study is 
an “audit” on GFW operators’ work. They may fix some cases, or even give it an overhaul. 

In particular. I’d expect GFW to revise all those terms who are blocked on a variant path other than the canonical 
path. It is apparent that China censors intended to block these terms but the current implementation does not actually do 
the job. There are 78 such cases, including the high profile case of i^*7fC 24 , for whom the variant zh.wikipedia.org/zh- 
cn/i^F*7K is blocked, but the standard path zh.wikipedia.org/wiki/i^F^TK is easily accessible. 

The author will keep an eye on such possible changes, and the readers are welcome too. 


24 iT/ji/K: Xu Zhiyong, a prominent civil rights activist. His arrest on July 16, 2013 attracted wide attention. 
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8 GFW Rulebook for Wikipedia 

The GFW Rulebook for Wikipedia has 919 entries (excluding URL terms). I group them into the following eleven 
categories: 

• Events. These are articles covering recent events. From here we can gain insights on what is deemed sensitive 
by China’s censors. It is interesting to see that lots of events in 2010 and 2011 are blocked. The counts for 2012 
and 2013 are much smaller. This suggests that China’s censors were more keen on Wikipedia in 2010 and 2011, 
and their attention declined since. 

• Media, Publications, Censorship and Circumvention. 

• Organizations, Political & Government. 

• People. I further group people terms into three subcategories: Dissidents, Government Officials, Miscella¬ 
neous. Note that people related to Hong Kong, Taiwan, Tibet, Uyghur and Falungong are described in respective 
categories. Five more people names are in the SELF category (see below). 

• Regional. This group contains four subgroups for topics related to Hong Kong, Taiwan, Tibet and Uyghur. 

• Falungong. This group is for terms related to Falungong, a religious group persecuted in mainland China. It is 
somewhat unexpected that there are only 19 Falungong-related rules, and many Falungong related articles are 
not on GFW’s rulebook. This does not mean those articles are accessible in China though, because GFW has 
five filtering rules in its HTTP response scan that target Falungong. 

• Tiananmen. This group is for terms related to the 1989 Tiananmen Protest (a.k.a., the June 4th Incident). We 
can see that Tiananmen Incident is the most sensitive topic the China censors want to block. 116 rules are in this 
section, including all kinds of arcane terms. There are also about 10 rules in the Government category and about 
15 rules targeting the User namespace which are motivated by the Tiananmen Incident. 

• Miscellaneous. This small group covers terms which do not naturally fall in any other categories. 

• Non-Articles. This group contains GFW rules that target non-article pages (i.e. namespace != 0, e.g. Talk 
pages. User pages, etc). I guess few people would have expected that China censors tried this hard to precisely 
block such pages. For example, for certain topics, GFW blocks the Talk page but not the main article. Looking 
at the page content, we see that the Talk page contains unwelcome content while the main article does not. 
This suggests that China censors had examined the Wikipedia page diligently. Certain pages in this group are 
extremely obscure. Their presence demonstrated that China censors have intimate knowledge of Wikipedia 
content, or that certain Wikipedia editors informed the China censors about the existence of such content. The 
author is inclined to believe in the latter. 

• SELF. This is grouping by match type, not really a grouping on subject. I separate this category because they are 
significant to a degree. In theory one would assume that terms blocked by this fashion must be hyper-sensitive, 
but this is not always the case. Quite several terms in this group are not that sensitive at all. 

• URLs. GFW has a large number (estimated to be in the thousands or tens of thousands) of URLs on their 
rulebook. Here we include those URL terms that 1. affect pages in the Chinese Wikipedia, or 2. affect articles 
in the English Wikipedia. There are 37 of them, and not included in the total count of 919. 

Obviously, many terms fall into multiple categories. For simplicity we assign one category for each term. The 
guideline I use here is “significance” and/or specificity. For example, if a term is both in SELF and in People-Official, 
I will put it in SELF because the SELF category is more significant. The rough order of significance I use is: SELF > 
Non-Articles > Regional > People > others. 

For rules targeting the English Wikipedia, I opted to categorize them by topic, (usually) together with their corre¬ 
sponding Chinese entries. There are only 22 rules targeting the English Wikipedia, so I’ll also list them below: 
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GFW Rule 

Category 

Note 

en.wikipedia.org/wiki/Jasmine_Revolution_in_China 

Events 


en.wikipedia.org/wiki/Charter_08 

Events 


en.wikipedia.org/wiki/Crown_Prince_Party 

Government 


.wikipedia.org/wiki/Princeling 

Government 

non-standard rule 

en.wikipedia.org/wiki/Chen_Guangcheng 

Dissidents 


en.wikipedia.org/wiki/Liu_Xiaobo 

Dissidents 


en. wikipedia.org/wiki/ylj 

Dissidents 

Liu Xiaobo, in simplified Chinese 

en. wikipedia.org/wiki/PJBjl fit 

Dissidents 

Liu Xiaobo, in traditional Chinese 

en.wikipedia.org/wiki/Liu_Xianbin 

Dissidents 

The Chinese article is not blocked 

en.wikipedia.org/wiki/Dalai_Lama 

Tibet 


en.wikipedia.org/wiki/Tenzin_Gyatso 

Tibet 


en.wikipedia.org/wiki/Flag_of_Tibet 

Tibet 


en.wikipedia.org/wiki/TibetanJndependence_Movement 

Tibet 


en.wikipedia.org/wiki/Students_for_a_Free_Tibet 

Tibet 


en.wikipedia.org/wiki/Rebiya_Kadeer 

Uyghur 


en.wikipedia.org/wiki/East_TurkestanJndependence_Movement 

Uyghur 


en.wikipedia.org/wiki/Tiananmen_Square_Protests_of_1989 

Tiananmen 


wiki/20th_Anniversary_Tiananmen_Square_Incident_March 

Tiananmen 

non-standard rule 

en.wikipedia.org/wiki/Tank_Man 

Tiananmen 


en.wikipedia.org/wiki/Tiananmen_Massacre 

Tiananmen 


en.wikipedia.org/wiki/Tiananmen_Papers 

Tiananmen 


en.wikipedia.org/wiki/Great_Wall_of_China 

Miscellaneous 



Table 7.0: GFW Rulebook: Rules Targeting the English Wikipedia 


The following is the table format and notation for Section 8.1 to Section 8.11. 

• prefix.standard: means the GFW rule is zh. wikipedia . org/wiki/$term. 

• broad.standard: means the GFW rule is zh . wikipedia . org & $term. 

• blue texts are page titles of (with link to) Chinese Wikipedia pages whose canonical path is blocked. 

• green texts are page titles of (with link to) Chinese Wikipedia pages whose canonical path is not on GFW 
rulebook but a variant is. rulebook. 

• brickred texts are page titles of (with link to) blocked English Wikipedia pages. 

• The first number in the last column denotes the number of Chinese Wikipedia pages that offend GFW rulebook. 
A second optional number prefixed with “EN:” denote number of such English Wikipedia pages. 

• As explained in Section 7.1, some entries do not affect any Wikipedia title, hence the count 0. 

• If a term is identical to a blocked page title and the rule is on the canonical path (i.e. “/wiki/”), we color the term 
in the first column and do not repeat in the third column. 

• When a term is targeted by multiple rules, we do not repeat the term on subsequent rows. We use the following 
order: “/wiki/” > “/zh/” > “/zh-cn/” > “/zh-tw/” > “/zh-hk/” > “/zh-sg/” > “/zh-hans/” > “/zh-hant/”. 

• Horizontal lines separate different entries. Terms which are close in meaning are grouped together. For example, 
if both the simplified term and traditional term in on GFW rulebook, they are grouped together. 
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8.1 Events 

Entries in this section are in reverse chronological order. I further divide the terms into the following buckets: 

• Events in 2012-2013, 3 events, 6 rules. 

• Events in 2011, 7 events, 26 rules. In particular, the Chinese Jasmine Revolution alone has 12 rules including one 
targeting the English Wikipedia, which speaks for its degree of sensitivity from China authority’s perspective. 

• Events in 2010, 6 events, 14 rules. 

• Events 2005-2009, 7 events, 13 rules. In particular. Charter 08 has 6 rules, including one targeting the English 
Wikipedia, which speaks for its degree of sensitivity from China authority’s perspective. 

• Events before 2000, 6 events, 10 rules. It is both puzzling and funny that a historical event in 1085 (during the 
Song Dynasty) is on GFW’s rulebook. That event is not well known and the page content is not sensitive either. 


Term GFW Rule Pages Affected 

Events in 2012-2013 


"fcTpiJf prefix, standard 1 

Seven Do-not-talk, seven areas that China authority forbids discussion, outlined in an internal memo on May 13, 2013. They are: 1. universal 
values; 2, freedom of press; 3, civil society; 4, civil rights; 5, historical mistakes ofCCP; 6. crony capitalism; 7, judicial independence. 


2013^ if 

2013^ Sf 


prefix, standard 
prefix, standard 

prefix.standard 

zh. wikipedia.org/zh-tw/$term 


1 

2 2013^ «S^J!*» 
12013^ «*^JSI*» 


Southern Weekly Incident, January 2013, a major conflict between Southern Weekly editorial staff and the propaganda authority. 


ILJXTl prefix, standard 

a protest in Wansheng, Chongqing, on Apr 12, 2012. 


1 


Events in 2011 

prefix, standard 1 

zh. wikipedia.org/zh-cn/$term 
zh. wikipedia.org/zh-tw/Sterm 
prefix, standard 1 

zh. wikipedia.org/zh-cn/Sterm 

a large scale anti-corruption protest in Wukan, Guangdong from Sep 2011 to 2012. 

3l prefix.standard 3 3l 

“Fifty-cent egg". On Oct 1, 2011, during then-Premier Wen Jiabao’s visit to a university, he asked for the price of an egg and got a reply of fifty 
cents (RMB), which was deemed as too low ’ by the general public. 

T^iSiZ^pxIpl giJIf zh.wikipedia.org/zh-cn/Sterm 1 PXJJ/IBMt 

a protest in Dalian, Liaoning Province on Aug 14, 2011 against a PX chemical project. 

prefix, standard 1 

# 

201 zh. wikipedia.org/zh-tw/Sterm 1 201 

mtn 

A high speed rail accident near Wenzhou, Zhejiang on July 23, 2011. It has long lasting effect on China’s rail system and control of press. 

r prefix, standard 1 

A riot in Xintang Township, Zengcheng, Guangdong Province on Jun 10, 2011. The trigger was a street peddler beaten by Chengguan, despite the 
victim being pregnant. 

prefix.standard 2 

zh. wikipedia.org/zh-tw/Sterm 


201l4pgBJ§ 

Wukan Incident, 
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...continued from previous page 

Term GFW Rule Pages Affected 

zh. wikipedia.org/zh-hk/Sterm 
prefix, standard 1 

zh. wikipedia.org/zh-cn/Sterm 

a protest in Inner Mongolia in May 2011 against the killing of a Mongolian environmental activist. 


t HfgflBE? np- 

Jasmine .Revolution _in 


prefix, standard 
zh. wikipedia.org/zh-hk/Sterm 
prefix.standard 
prefix.standard 
zh. wikipedia.org/zh-sg/Sterm 
prefix, standard 
zh. wikipedia.org/zh-cn/Sterm 
prefix, standard 
zh. wikipedia.org/zh-cn/Sterm 
prefix.standard 
zh. wikipedia.org/zh-cn/Sterm 
.Chinan.wikipedia.org/wiki/Sterm 


1 

1 

1 

1 

1 

1 

O-EN: 1 Jasmine _Revolution_in_China(en) 


various terms for the Jasmine Revolution (in China), a series of nationwide protests since February 2011. 


Events in 2010 

2010-2011^11 prefix.standard 6 2010-201l¥^H®fBlL 2010-201 I^^HHtUSL 2010- 

20ll^MH3ES®Jt^M 2010-201 l^HH3E3SSL ... 

Tunisia protests from December 2010 to 2011, a.k.a. Jasmine Revolution. 

Wikileaks zh.wikipedia.org/zh-hk/Sterm 3 WikiLeaks WikiLeaksffisS^B|j , h5!%ffi¥'( i f ; Wikileaks 

zh. wikipedia.org/zh-hk/Sterm 1 

Wikileaks'littSiiilll^bX zh.wikipedia.org/zh-cn/Sterm 1 WikiLeaksitttlJItHl'bjbi^Lfll.SPW 

zh.wikipedia.org/zh-tw/$term 

The Wikileaks incident. These Chinese Wikipedia pages cover part of those cables related to China and Taiwan. 

2010^flSIS 0SM'/g zh.wikipedia.org/zh-hk/Sterm 1 2010^HJS. 0 

zh. wikipedia.org/zh-sg/Sterm 
Anti-Japan protests in October 2010 in China. 

zh.wikipedia.org/zh-cn/Sterm 1 SUM 

2010^SgMMW 5 BI zh.wikipedia.org/zh-tw/Sterm 2 2010^if; PI ’iMPT'^c 2010^i|jJXl 

zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

Nobel Peace Prize in 2010. which was awarded to Liu Xiaobo on Oct 8. 2010 and outraged Chinese authority. 

SaM/lbSfct prefix.standard 2 SaJISl/LlSfct_(2010^) 

Typhoon Fanapi. Not politically sensitive, but the page contains casualty information in mainland China. 

prefix, standard 1 

zh. wikipedia.org/zh-cn/Sterm 

Google “exiting” China, i.e. stopping self-censorship on google.cn. in March 2010. 


Events in 2005-2009 

M0StJlrMf¥'f z b : - i l'II£ : |i : prefix, standard 1 

# 

A violent dispute between migrant Uyghurs and Han workers at a toy factory on Jun 25/26. 2009, a.k.a. Shaoguan Incident. It was regarded as 
one trigger of the violent July 5th Riot in Xinjiang. 

prefix, standard 1 

The Nuctech-Namibia scandal, Nuctech is a Chinese tech company alleged to have bribed Namibia officials. The case involved Hu Haifeng, son 
of Hu Jintao and is heavily censored on the web. 

zh. wikipedia.org/zh/$term 1 

continued on next page... 
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...continued from previous page 

Term GFW Rule Pages Affected 

St'/S*! 

investigation of student casualty in the Wenchuan earthquake, initiated by artist Ai Weiwei in Dec 2008. 


083 * 5 $ 

0811 $ 

O A3£$ 

»AS$ 

Charter_08 


broad, standard 
broad, standard 
prefix.standard 
wikipedia & Sterm 

wikipedia & Sterm 
en.wikipedia.org/wiki/Sterm 


1 

0 

1 

3-EN: 1 Talk:# A 3*5$ Category:# A3 *g$^‘ 
$(en) 

2 -EN:l Category:#A;R$ #AH$(en) 

0-EN:l Charter_08(en) 


- A 3fe 


various terms for Charter 08, a manifesto initiated by the Nobel Peace Prize laureate Liu Xiaobo. 


prefix, standard 1 

Yang Jia Incident. Yang Jia assaulted a Shanghai police department on Jul 1, 2008, causing six deaths. 


®£*# broad.standard 5 ®Wj#V# Talk:®'*# Talk:®'*#^# 

HI*# broad.standard 2 2007$?f 

Yilishen Incident, a large scale scam in Shenyang, Liaoning Province, resulting in large scale protest in 2007. 

prefix.standard 1 

An incident in Linyi, Shandong Province caused by brutality in enforcing one-child policy. Chen Guangcheng organized the opposition and was 
sentenced to jail. 


Events before 2000 


broad, standard 1 

Beijing Spring, a short period of relative political openness and freedom in China, from 1977 to 1978. 


H$fii prefix, standard 1 

broad, standard 2 Talk:ffi$K;$i# 

broad, standard 1 

KiilifiiiSitt prefix.standard 1 

various terms for the “Xidan Democracy Wall”, a location in Beijing where people expressed their political views during the Beijing Spring. 


prefix.standard 1 

Prague Spring, a 1968 reform movement in Czechoslovakia. 


prefix.standard 1 

“Three Years of Natural Disasters”, this is the official term for the Great Famine (1958-1961) caused by the Great Leap Forward. 


broad, standard 2 Talk:i^T¥3£ 

broad, standard 0 

Yining Incident, a.k.a. Three Districts Revolution, Three Districts Revolt, a Soviet-backed revolt in 1944 seeking independence of Xinjiang. 
Chinese authority’s positive portraying of this incident is at odds with its current iron fist approach to Uyghur affairs. 


TC'ffijE'f-fc zh.wikipedia.org/zh-cn/$term 1 TCffijSf-fc 

A historical event circa 1085, during the Song Dynasty, in which the Wang Anshi Reform was abolished. This page’s content has nothing related 
to China’s current affairs. 


Table 8.1: GFW Rulebook: Events 
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8.2 Media, Publications, Censorship and Circumvention 

This section covers censorship of media and press. The terms are grouped into: 

• Media, newspapers and magazines. 

• Publications, in reverse chronological order (by date of publication). There are 9 books and 2 TV series (not 
counting those related to Tibet, Uyghur and Tiananmen). 

• Censorship and censorship circumvention. It is noteworthy that GFW’s Wikipedia rulebook does not cover the 

term ik~\xW,” or its English term (“Great Firewall”), or alternative terms, e.g. (China’s 

National Firewall). Due to GFW’s HTTP response filtering (explained in Section 5, Chinese users can not 
really access these pages successfully, because they contain strings that are on GFW’s HTTP response filtering 
rulebook, e.g. (Falungong) and/or (Dalai Lama). 

Also worth noting is that even though the term “Great Firewall” is not on GFW rulebook, the benign term “Great 
Wall of China” is. More precisely, the string “en.wikipedia.org/wiki/Great_WalLof_China” would trigger GFW 
reset. Isn’t this puzzling? 

Term GFW Rule Pages Affected 

Media, Newspapers & Magazines 

tli# prefix, standard 2 

Twitter zh.wikipedia.org/zh/$term 3 Twitter Twitterrific TwitterRAARffllMIfAIA 

zh. wikipedia.org/zh-hk/Sterm 

Twitter website is blocked in China, here we list the three GFW rules regarding Wikipedia for Twitter. 

Wifi prefix.standard 3 ftiflSfM 1*1 ftitll*! 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

Boxun, a news resource for China related issues. 

broad.standard K 

(HI*] User_talk:^tS5§ Category:^f§^t#A ... 

broad.standard 22 ... 


Dwnews, a news resource for China related issues. The two characters happen to be in Chinese phonetic translation of many foreign terms, e.g. 
Devillers hence affecting a large number of Wikipedia pages. 


Rwra 

prefix, standard 

1 

Bullogger.com, a 

blogging website created by Luo Yonghao, which hosts many censored blog posts. 


broad, standard 

3 Talkiitllita Category:-#^ gitBaW 





broad, standard 

1 

Voice of America. 



SiftAW] 

prefix, standard 

1 

f^ftirHa 

zh. wikipedia.org/zh/$term 

i SiftStHtt 


Bloomberg company, Bloomberg News Agency, blocked due to its reporting on wealth ofXi Jinping’s relatives. 


zh.wikipedia.org/zh-cn/$term 1 

China Digital Times, a news website operated by Xiao Qiang et al, who is affiliated with UC Berkeley. 

ttS JiStt prefix, standard 1 

New Century Press, a Hong Kong based publisher, which published many books banned by China authority. 

zh.wikipedia.org/zh-cn/$term 10 SL^J 

mtmmr® ... 

New York Times, blocked due to its reporting on Wen Jiabao’s family wealth. 

AS 0 prefix.standard 4 AS 0 AS 0 flxtt AS 0 AxiScAliS. 

zh. wikipedia.org/zh-cn/Sterm 
People Daily, the official mouthpiece of Chinese government. 

ffiiH'IMf'J prefix.standard 1 
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zh. wikipedia.org/zh-cn/Sterm 

SiffiilfLl zh. wikipedia.org/zh-cn/Sterm 2 

Asia Weekly, a Hong Kong-based newsweekly. 

KfteHM prefix.standard 1 

Sun TV (literally sunshine satellite TV), a Chinese satellite TV station based in Hong Kong. 

zh. wikipedia.org/zh-cn/Sterm 1 

SAHxfl? prefix.standard 2 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
prefix.standard 1 

Sun Affairs, a weekly magazine based in Hong Kong. 

if AfUrUfg. prefix.standard 3 if A tWfW# ifA#if ffi IS 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

Southern Metropolis Daily, usually regarded as a outspoken newspaper in mainland China. Its sister newspaper Southern Weekly, had a major 
conflict with the propaganda authority in Jan 2013, i.e. the Southern Weekly Incident. 

JfSisfeA broad, standard 1 

fill' lij/ i/tii ri.ri broad, standard 1 

Open Magazine, a Hong Kong based monthly magazine. 

broad.standard 2 

broad, standard 1 

World Economic Herald, a pro-reform newspaper in the 1980's, it was shut down after the Tiananmen Incident. 

broad, standard 1 

SXAfiS broad, standard 1 

China News Digest, a news website operated by Chinese diaspora in US. 

S if A broad, standard 0 

[=J if AHfitfil broad, standard 0 

Free China Forum, a defunct web forum. A Wikipedia article with this title was deleted and no longer exists. 

Ajs^-ll broad, standard 1 

Macro Reference, a web magazine no longer in operation. It was run by Aft A (Li Hongkuan). 


Publications 

MUi broad.standard 2 Talk:M5§J 

MfM broad, standard 0 

River Elegy, a documentary TV series on China history and culture, aired in 1988, banned after 1989. 


Hu_Jintao_Biography zh. wikipedia.org/zh-cn/Sterm 1 HuJintao_Biography 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

Chinese title is a book by Wen Siyong and Ren Zhichu. published in 2003. 

zh. wikipedia.org/zh-cn/Sterm 1 ^JWL?IA 

Zhou Enlai’s Later Years, a banned book by Gao Wenqian published in 2003. Zhou Enlai was the first premier of People’s Republic of China. 

zh. wikipedia.org/zh-cn/Sterm 1 

Memoir of Mao Zedong’s Private Doctor, a memoir by Li Zhisui. published in 2005. English title is The Private Life of Chairman Mao. 

'■ prefix.standard 1 

Mao Zedong: the Unknown Story, a banned book by Zhang Rong and Jon Holliday, published in 2005. 

Sint AH prefix, standard 1 

Torrents in China, an award-winning documentaiy series by Japan Broadcasting Corporation, aired from 2007 to 2008. 

zfM prefix, standard 1 

Collection of Rights Defending Poems, a banned poetry book published in 2008. 

broad.standard 2 Talk:3/¥0J@ 

prefix, standard 
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zh. wikipedia.org/zh-cn/Sterm 

Journey of Reform, a banned book published in 2009. English title is Prisoner of the State, the Secret Journal of Premier Zhao Ziyang. 

ikXCjfM—flVUfi zh.wikipedia.org/zh/$term 1 TlAiS - 

Rivers and Seas in 1949, a bestselling history book by Taiwanese author Long Yingtai, published in 2009. 

prefix.standard 1 

Who is the New China, a banned book by historian Xin Haonian, the Chinese version was published in 2012. 

PS—-iS [§!'[£§ zh.wikipedia.org/zh/$term 1 -i'50'l'ZLS 

Memoir of Chen Yizi, a banned book published in 2013. 

Censorship 

ife® zh.wikipedia.org/zh-hans/$term 4 ffeljf A/lf BIl^zXIS 

jfe/lfX® broad, standard 2 Talk^HfXfil 

Golden Shield, a surveillance project operated by the Ministry of Public Security, which is different to, but often confused with GFW by observers. 
prefix, standard 1 

Green Dam Lady, an anthropomorphism name in response to the release of Green Dam, a government-developed content control software. 
broad, standard 1 

list of filtered keywords by Chinese internet software. 

prefix, standard 1 

WMAIS 

list of websites blocked by the People’s Republic of China. 

prefix, standard 1 

f&mm 

list of publications banned by the People’s Republic of China. 


Censorship Circumvention 

TOR zh. wikipedia.org/zh-cn/Sterm 

Tor, an online anonymity software. 

prefix, standard 

Ultrasurf, a popular GFW-circumvention proxy software. 


43 TOR TORCS ToRo Torchlight Torixoreu Tornado Toronto 
Torrent TorrentAft TortoiseJfiA TorlliX ... 


4AI?-St!lxa AJf-Si® A#?^^ 


PililH broad, standard 0 

Freegate, a popular GFW-circumvention proxy software. The simplified Chinese version is not on GFW rulebook for Wikipedia. 

tSI?-® prefix.standard 5 IttIP-iSX. t&J'P-ifinflA® 

GPass, a deprecated GFW-circumvention software. 

42H3IW] broad, standard 1 

broad, standard 0 

Garden Network, a deprecated GFW-circumvention software. 

ffiMilAlJ prefix, standard 1 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 

Xi Xiang Project, a GFW-circumvention software/project. 


mm 


A5 


prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 

prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 

zh. wikipedia.org/zh-hant/Steim 

prefix, standard 


3 msa mmm* 

1 mmmm 

2 


i 


various terms for GFW-circumvention technology/software, internet censorship circumvention. 


Table 8.2: GFW Rulebook: Media, Publications, Censorship and Circumvention 


25 
























8.3 Organizations, Political & Government 


Complete GFW Rulebook for Wikipedia 


8.3 Organizations, Political & Government 

This section covers: 

• Organizations (NGO’s and dissident organizations). 

• Political terms. 

• Government-related terms. 


Term 

GFW Rule 

Pages Affected 

Organizations 

Reporters Without Borders. 

broad, standard 

broad, standard 

2 TahcAHJAiAf 

3 Talk:*HI?-®:# 

fcibroad, standard 
broad, standard 

China Pan-Blue Union, a political organization in 

3 ^H '&MM.A. Talk:H&I UserJalk:4 1 H1 ZEM'A. 

1 

mainland China established in 2004. Several organizers were sentenced to jail. 


broad.standard 9 4*III^^^-(1988) Talk:^H 

Template:HCategory:^ ... 
broad.standard 9 ^HKA3t_(' : f I ]IIA@!) AHK±3t-(A¥g!;]lll) AHKA 

China Democratic Party, a political organization established in 1998. Many organizers were sentenced to jail. 



broad, standard 

2 Talk :®AAA®# 


broad, standard 

0 

Independent Chinese PEN Center, an organization of independent Chinese authors established in 2001. 

ASM®# 

prefix, standard 

zh. wikipedia.org/zh-hk/Sterm 

1 

China Family Church, a.k.a. 

China Underground Church, churches independent from China authority. 

tsfiii 

broad, standard 

2 A SI 

a urn ash 

broad, standard 

l 

China Labor Watch. 




prefix, standard 1 

zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-hant/Sterm 


Chinese Blogger Conference, an annual “unconference” for Chinese bloggers, started in 2005. It was cancelled in 2010 due to interference from 
the authority. In 2012, it was moved to “the cloud". 

^ HAHAtX-lfi&AA prefix.standard 1 

China Tianwang Human Rights Center, a rights defending organization in China, established in 2006. 

AIHA 1*1 broad, standard 1 

AH AM broad, standard 0 

June 4th Tianwang, a website operated by China Tianwang Human Rights Center. 

AA prefix.standard 1 

Open Constitution Initiative, Law Research Center. The largest NGO in China on civil rights activism. Established by Xu Zhiyong et al in 2003. 
After being shut down by the authority in 2009, the staff changed the name to “Citizens” and continue the work. 


Political 

mm s h 


freedom of press. 


zh.wikipedia.org/zh-cn/Sterm 5 KfW § S ifrKI § iffM@S3?raA SfKI fH HJsS 

StM§£«# 


fflAMSS prefix.standard 1 

peaceful transition, a term used by China authority to describe efforts by foreign powers to subvert the Communist Party rule. 


prefix.standard 1 

public intellectual. 
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SASAi broad, standard 1 

sensitive persons. 

iitif&imM prefix, standard 2 

Qincheng Prison, well known for political prisoners and high rank government officials. 



prefix, standard 


prefix, standard 


zh. wikipedia.org/zh-cn/Sterm 


zh. wikipedia.org/zh-hant/Sterm 

jailing people for their writings and speeches. 


broad, standard 


broad, standard 


hunger strike; fasting. 

A H A® broad, standard 

A HAS broad, standard 

China human rights. 


1 

1 


7 1981 AgAA£6MC®. 2006AAH£®®^@A 2006A 
I'g'fcgA 2006A^®®^® A*# ... 

15 2006AAH,*SI6'frSA Talkife-fr Cate- 

gory^ES^frfc^AHA... 


13 AHA®_(fi*R) AHARKift AHARM^J AHA® 
lAS Talk: A H ARjSsjj Template: AHA®## ... 

9 AHAS-OtfiH) AHASIS# AH ASKS AHASiS 
m AiilSfll ABASES... 


AHRAisA broad, standard 1 

AHRAtSIi broad.standard 3 Category: AH RASHS Category: AHRAiSIft XXf] 


China democracy movement. 


ah 

prefix.standard 

1 

AHUAHKAIf^J* 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

i aw#ai« mmm 

AHUAHUMcSm 

prefix.standard 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

i 

A¥ARAfnW#AH 

mim1 

zh. wikipedia.org/zh-tw/Sterm 

l AHARARWtAHRjyrAJA 

AHKi'nAAAA 

list of Chinese dissidents. 

prefix.standard 

l 


Government 


broad, standard 

Chinese Communist Party (CCP), in traditional Chinese. 


92 AHA**®#A AHASltlEg Talk:AHA8«S« 
Talk:#® User: AHAjS*£BM®lgAftAAIS»R® ... 


A A A A A iH prefix. standard 1 

This term means “five secretaries of CCP ', referring to Mao Zedong, Zhou Enlai, Liu Shaoqi, Ren Bishi and Zliu De, elected in 1945. 


AH prefix.standard 1 

communist bandits, a derogative Icon for CCP. 


Hlilil zh.wikipedia.org/zh/$term 1 Hltlk 

Faction of the Youth League, a major faction in CCP power struggle, composed of Party cadres with leadership experience in the Youth League. 


ISA A prefix.standard 

KWH A A prefix.standard 

Central Committee of the Communist Youth League. 

AHAKAAWAIil prefix.standard 

Chinese Communist Youth League. 
AHAlSASW AHM prefix.standard 

mmx 


i 

i 


4 AHAAAAWASAASfiA AHARAAWAilAA 

f/LAlBiit AHARAAWASIIIS: 


l 
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past leaders of the Chinese Communist Youth League. 


zh. wikipedia.org/zh-hant/Sterm 1 

Chinese New Democracy Youth League, a precursor to the Chinese Communist Youth League. 

K2m.fr rfrj prefix, standard 1 

Hu-Wen System, Hu and Wen refer to Hu Jintao and Wen Jiabao, president and premier of China from 2002 to 2012. 


Wi zh. wikipedia.org/zh-hant/$term 

“Primary Stage of Socialism”, a terminology in CCP theories. 




—prefix.standard 3 —A4A', MAXAA —AAAMAXAA 

“One Central Task”, combined with “WASAAffn'o basic points)”, is CCP’s grand strategy, the central task is economic development, the two 

basic points are 1. stick to four core principles, 2. stick to reform and opening-up. 




prefix, standard 
zh.wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
.wikipedia.org/wiki/$term 
en.wikipedia.org/wiki/Sterm 


1 


Princeling 
Crown_Prince_Party 

descendants of top Party officials in China, many of whom holds high-level political and business positions. 


0-EN:2 Princeling(en) Princeiings(en) 
0-EN:l Crown_Prince_Party(en) 


prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 

zh. wikipedia.org/zh-tw/Sterm 

web commentators organized and/or sponsored by the government. 


2 X) 


4 1 4^ 4 1 A jfcfil nP broad, standard 

Propaganda Department of the Central Committee of CCP. 


tarn 




zh. wikipedia. org/zh-cn/$term 1 

prefix, standard 1 

All-China Women’s Federation. The term is not sensitive, but its Wikipedia page contains a June 4th Incident template, which was added on Oct 
23, 2011. This suggests that GFW added this rule after this date. 

prefix.standard 1 

All-China Federation of Returned Overseas. Non-sensitive term, same story as the previous entry. 


zh.wikipedia.org/zh-hk/$term 

China Law Society. Non-sensitive term, same story as the previous two entries. 


4 1 HS# A 


CCP Central Guard Bureau. 


prefix.standard 

prefix.standard 


prefix.standard 

National Defense Mobilization Commission. 


prefix.standard 3 AH AHASW 

Chinese People’s Liberation Army General Logistics Department, one of four General Departments ofPLA. The other three are not blocked: 
General Staff Department General Political Department General Armaments Department 


AKzftfP H zh.wikipedia.org/zh-cn/$term 

prefix.standard 

list of corruption cases in People’s Republic of China. 


2 AAAAAfnHgt?4*f43A 

l 


2012 A AAAAAfnH prefix.standard 1 

zh. wikipedia.org/zh-tw/Sterm 

list of corruption cases in People’s Republic of China in 2012. Other years are not blocked. 


A^ARAfflH^'S prefix.standard 

Constitution of People’s Republic of China. 


2 
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8.4 People 

This section covers people names, but excluding those related to Hong Kong, Taiwan, Tibet, Uyghur, Falungong and 
five terms in the SELF category, (see respective sections). I group these names into the following subcategories: 

• Dissidents, further divided into: 

• “Newer” terms (see explanation below). 

• “Older” terms (see explanation below). 

• Government officials, further divided into: 

• Names (and nicknames) of the 9 members of the 17th Politburo Standing Committee, and their relatives. 

• Names of the 15 members of the 17th Politburo not in the Standing Committee. 

• Names of past top Party leaders. 

• Other lower-level government officials, whose names are blocked either due to scandals or their close 
relationship to the censorship apparatus. 

• Misc, a few names who are neither dissidents, nor government officials. 

For dissidents, I observe two types. One type seems to be “newer”, mainly by prefix rules. The other type seems 
to be “older”, and are blocked by broad rules, furthermore, both the simplified and traditional versions are blocked 
(by broad rules), even for cases when Wikipedia does not have an article for one version. My hypothesis for the 
“older” version is that the censorship official handed a (long) list of names to the GFW operator (probably just in one 
“transaction”), who did the due diligence and blocked both the simplified and traditional versions using the broad rule. 
For the “newer” type, it is likely the censorship officials were giving URLs to the GFW operators, presumably in many 
transactions. 

Note that a few people likely got both - i.e., they are in that “broad rule” list, but GFW “revisited” them on 
alternative names (or Wikipedia titles). These cases are: JS, IfUSll!, I list them in the 

“Newer” section. 

The rationale to divide the Officials category into these four groups is that GFW rulebook has standard treatment 
to names of the members of the 17th Politburo (in office between 2007 to 2012). For all of these 25 names, they are 
blocked by the broad rule, and have both the simplified and traditional versions blocked. For the newer batch (i.e. the 
18th Politburo from 2012) or older batches (16th Politburo or earlier), there is no such standard treatment. 


Term 

GFW Rule 

Pages Affected 

Dissidents (newer terms) 


prefix.standard 
zh.wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 


Ai Weiwei, dissident artist. 




zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 

i feliff 

Bao Zunxin, (1937 - 2007/10/28), prominent dissident scholar. 


4 t Si _(ff A) 

zh. wikipedia.org/zh-tw/Sterm 

1 4t^_(itA) 

Bei Dao, dissident poet, relocated back to Hong Kong in 2007. 



prefix.standard 

zh. wikipedia.org/zh/$term 

zh. wikipedia.org/zh-hans/Sterm 

l 

PfAli 

prefix.standard 

2 PfAMiSH## 


continued on next page.. 
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Chen_Guangcheng en.wikipedia.org/wiki/$term 0-EN:l Chen_Guangcheng(en) 

Chen Guangcheng, the legendary blind civil rights lawyer/activist, who escaped from house arrest and fled to U.S. Embassy in Beijing in April 2012. 

I5f zh.wikipedia.org/zh/$term 2 

Chen Yizi dissident economist, who published a memoir in May 2013. The simplified version of his name is not on GFW rulebook. 

prefix, standard 1 

Cheng Xiang, journalist, arrested for espionage in Apr 2005, released in Feb 2008. 

T"F;R broad, standard 2 Talk 

zh-yue.wikipedia.org/wiki/Sterm 

Ding Zilin, organizer of Tiananmen Mothers, a group of relatives of victims in the June 4th Incident. One of only four terms blocked on the Yue 
(Cantonese) Wikipedia. 

ttlrlH zh.wikipedia.org/zh/$term 1 t±^IE 

zh. wikipedia.org/zh-tw/$term 
zh. wikipedia.org/zh-sg/Sterm 

Du Daozheng, pro-reform Party veteran, president of the magazine Yan Huang Chun Qiu. 

zh. wikipedia.org/zh-cn/Sterm 1 

zh. wikipedia.org/zh-hk/Stenn 

Feng Zhenghu, activist. Feng was refused re-entry into China in 2009, he protested and remained in Narita International Airport (Tokyo) for 92 
days, which attracted worldwide media attention. 

'KISi'ffll prefix, standard 1 

broad, standard 1 

Hou Dejian, Taiwanese musician. June 4th Movement participant. The first is his correct name, the second is a misspelling. 

zh. wikipedia.org/zh-cn/Sterm 1 

Jia Jia, dissident, father of Jia Kuo (see below). 

M I®] prefix, standard 1 

Jia Kuo, dissident, son of Jia Jia (see above). 

prefix, standard 1 

zh.wikipedia.org/zh/$term 

Jiang Weiping. journalist, jailed for six years for reporting secrets of Bo Xilai. 

WPZfk zh. wikipedia.org/zh-cn/Sterm 1 M07K- 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

Jiang Yanyong, a high ranking doctor in PLA medical system. He leaked the truth about SARS pandemic to the media in Apr 2003. 

2^0£PJ§ prefix.standard 3 2^H£PJ§2^fl£PJ§® § SVfr 

zh. wikipedia.org/zh-cn/Sterm 

Li Wangyang, (1950 - 2012/06/06). an activist who spent 22 years in jail for organizing protest in 1989. His sudden death in Jun 2012 was claimed 
to be suicide by China authority, which convinced few people. This incident raised major attention in Hong Kong. 

prefix, standard 1 

Liang Haiyi, activist, jailed for her activism in Chinese Jasmine Revolution. 

If (ff Mi prefix, standard 1 

Liao Yiwu, dissident poet and writer. He fled China in Jul 2011. 

PJR-lS'JBjSSitilr) zh. wikipedia.org/zh-sg/Sterm 1 

zh.wikipedia.org/zh-hans/Sterm 

Liu Xia, artist, wife of Liu Xiaobo. 

broad, standard 2 Talk^lJfj^iS 

en.wikipedia.org/wiki/Sterm 

en.wikipedia.org/wiki/Sterm 0-EN:l PJBj§S[(en) 

zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh-yue.wikipedia.org/wiki/Sterm 


PJBI/g 
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LiuJXiaobo 


en.wikipedia.org/wiki/$term 


0-EN:4 LiuJXiaobo(en) Liu_Xiaobo_(intellectual)(en) 
Liu_Xiaobo_(taekwondo)(en) Liu_xiaobo(en) 

Liu Xiaobo, prominent dissident scholar, 2010 Nobel Peace Prize laureate. 

Liu_Xianbin en.wikipedia.org/wiki/Sterm 0-EN:2 Liu_Xianbin(en) Liu_xianbin(en) 

dissident. He was sentenced to jail three times, the last one in Mar 2011. Surprisingly, the Chinese terms A/AASFcM are not on GFW rulebook. 

/iJiUK prefix.standard 1 

Liu Yiming, author, arrested for allegedly spreading rumors in the “70 Ktnph Incident" in May 2009. 


SSI 


1 


prefix.standard 
zh. wikipedia.org/zh-tw/$term 
gan.wikipedia.org/wiki/Sterm 

Pangu Band, an avant-garde punk band. This term is the only one targeting the Gan (Jiangxi dialect) Wikipedia. A related term “tStASIA” (in 
simplified Chinese) is blocked on its own. 

WS33 prefix.standard 1 

zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 
Pit Zhiqiang, rights defending lawyer. 

prefix.standard 1 

Ran Yunfei, writer, detained briefly in Chinese Jasmine Revolution. 


J'J'ixJ prefix.standard 

Sun Wenguang, dissident scholar. 


1 


ilSA zh. wikipedia.org/zh-cn/Sterm 1 i?SA 

Tan Zuoren, activist, sentenced to jail in Feb 2010, for investigating tofu-dregs schoolhouses in the Sichuan Earthquake. 


Tang Baiqiao, dissident. 


zh. wikipedia.org/zh-tw/Sterm 


1 


AS 


AS-(K:SA±) 

AS-lKtllAd:) 


prefix.standard 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-sg/Sterm 
broad, standard 
broad, standard 


9AS_(1969g) AS-OT) AS_(K;i£A±) AS_(K:iSA 
±) AS_('/liiKii) ASAStAisSiS) ASM ASS 


Wang Dan, student leader in the Tiananmen Protest in 1989. 


Bfift broad.standard 3 Talk:S"fe 

prefix.standard 1 

Tsering Woeser, Tibetan writer. SMA.v an obscure alternate name ofl%(fi. 


gS S prefix.standard 1 

Xia Yeliang, dissident, who published an open letter to Liu Yunshan, then-Party Propaganda Chief in May 2009. 


SStK zh. wikipedia.org/zh-cn/Sterm 1 ASA 

Xu Zhiyong, prominent legal scholar and activist, an iconic figure in China’s New Citizen Movement. 

IS g broad, standard 2 Talk: |i§] $3 g 

Yan Mingfu, Party veteran, he was removed from his posts after the Tiananmen crackdown. 


prefix.standard 1 

Yang Yinbo, a “ post-80 " writer, the youngest entry in GFW’s Wikipedia rulebook. 


1 


SjblSg prefix.standard 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

Yao Jianfu, scholar, authored memoir of Chen Xitong, the jailed former Beijing Party head, in May 2012. 

“rlSfliSz: prefix.standard 1 

Zhang Boli, a leader in the Tiananmen Protest in 1989. He fled China in 1991, later converted to Christianity and became a popular pastor. 
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‘jUttfif prefix, standard 1 

zh. wikipedia.org/zh-sg/Sterm 
zh.wikipedia.org/zh-hans/Sterm 

Zlwng Shijun, a PLA officer experienced the Tiananmen crackdown in 1989. He was the first PLA officer to publicly denounce the crackdown. He 
published an open letter to Hu Jintao using real name in Mar 2009. 

MUM zh. wikipedia.org/zh-cn/Sterm 1 

Zhao Lianhai, activist, organizer of "Home for Kidney Stone Babies", a group of family of victims in the melamine-tainted milk scandal in 2008. 

broad.standard 5 r Ta\k\MW,Wt UserJalkM^roftS 1 * Category: MW PH 

Category:MHKPBfflftftff 

MWPB broad.standard 3 Talk:®?® 

Mflsillt prefix, standard 1 

Zhao Ziyang, (1919 - 2005/01/17), former premier and CCP general secretary, a pro-reform politician who was removed from post in 1989 and remained 
under house arrest until his death in 2005. (Zhao Xiuye) is an obscure alternate name. 

ttnJSW! prefix, standard 1 

Zong Fengming, (1920 - 2910/01/07), Party veteran, author of Interviews with Zhao Ziyang, a book published in Jan 2007 in Hong Kong. 

Dissidents (older terms) 

i&M broad, standard 2 Talk MB 

broad, standard 1 

Bao Tong, Party veteran, secretary of Zhao Ziyang. 

WftW broad, standard 1 User:WftW 

Wftilf broad, standard 2 Talk: lift if 

Cao Chongqing, dissident writer. 

broad.standard 2 Talk:^£^ 

Chat Ling, student leader of the Tiananmen Protest in 1989. 

F5K27IS5 broad, standard 1 

PHH® broad.standard 2 Talk: PUS?® 

Chen Kuide, dissident scholar. 

broad.standard 3 Talk:|?f^0J User_talk:P!iftBJ] 

ISt-pHl] broad, standard 1 

Chen Ziming, dissident. 

broad, standard 2 Talk: 77®^, 

broad, standard 2Talk:75*)K \X. 

Fang Lizhi, (1936 - 2012/04/06), dissident scholar, iconic figure of Tiananmen Protest in 1989 and Chinese democracy movement. 

77 HI broad.standard 15 7fIH_(1948^) TflffllS UsenTi GSM 

User_talk:7j ESI User_talk:77 BlTl^P User_talk:77 Bis ... 

JJU broad.standard 5 ^Ht± Talk:* Ht± 

Talk:*l!$§*te User_talk:^BI^.H 

Fang Yuan, dissident. His name means “square ” and “circle ”, which appears in many phrases in Chinese. 


mm 

mm 

broad, standard 

broad, standard 

2 Talk:£f MJg 

1 

Feng Congde, student leader in Tiananmen Protest in 1989. 


it™ 

broad, standard 

1 

Gao Yu, female dissident journalist. 


|"rJ ^=3 Mic 

broad, standard 

2 Talk:* 

Gao Zhisheng, rights defending lawyer. 


ff/K* 

mMU 

Han Dongfang, activist. 

broad, standard 

broad, standard 

1 

1 

W^_(1973¥) 

Hu Jia, activist. 

broad, standard 

1 
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iWA (fSSG broad, standard 

Hu Ping, dissident scholar. 

2 Talk:SJT^*) 

ft broad, standard 

JfSf broad, standard 

Huang Qi, activist. 

2 Talk: ft 

1 

ij?fnp® broad, standard 

fffcfi® broad, standard 

Jiang Pinchao, dissident. 

2Talk:#p n p® 

1 

H B# broad, standard 

2Talk:ilBlfT 


broad, standard 0 

Jiao Guobiao, dissident, former Peking University professor, published an article “Denouncing the Central Propaganda Department" in Mar 2004. 


-KiSriC broad, standard 1 

Li Anyou, Chinese name of Andrew Nathan, American scholar specializes in Chinese issues. 


broad, standard 

SpUHt broad, standard 

Li Hongkuan, dissident. 

1 

0 

SpIS broad, standard 

A IS broad, standard 

Li Lu, student leader in the Tiananmen Protest in 1989. 

0 

1 


broad.standard 2 Talk:^fJiA 

Li Shenzhi, (1923 — 2003/04/22), scholar, a prominent exponent of Chinese liberalism. 


kllSiffi broad, standard 

ffflJtJft broad, standard 

2 TalkAJUJfS 

0 


Liu Binyan, (1925 - 2005/12/05), dissident, prominent figure in the Tiananmen Protest in 1989. 


^iJRlJ broad, standard 2 User Talk:User_talk: RI'J PdT'ftJ 


PJBJ broad.standard 5 PJ|f!L(P:illAi) PJBl-liSSj It) Talk:PJPf!| 

Liu Gang, student leader in the Tiananmen Protest in 1989. This is a very common Chinese name. 


JvlUM broad.standard 2 Talk:Iff* 


JfiSH broad, standard 1 

Qi Zhiyong, activist, injured in the June 4th crackdown in 1989. 


broad, standard 2 Talk: US5^ 


broad, standard 1 

Shi Too. journalist, arrested in 2004 and jailed until Aug 23, 2013. His conviction involved Yahoo’s cooperation to the China authority, causing 
huge damage to Yahoo reputation. 


zE'ffiiS broad.standard 2 AffilACAtib'ili) 

Wang Bingzhang, prominent activist in China's democracy movement, serving life in prison currently. 


broad, standard 

broad, standard 

Wang Juntao, prominent dissident scholar. 

i 

0 

broad, standard 

Wang Lixiong, writer, husband of Woeser. 

2 Talk: A A It 

broad, standard 

Wang Ruowang, (1918 - 2001/12/19), dissident. 

2 TalkAAM 

3£5Cfn broad, standard 

Wang Wenyi, reporter of Epoch Times. 

2 Talk:I£t6 

broad, standard 

Wang Youcai, dissident. 

2 Talk 


WiLMQi broad, standard 2 Talk:|S^P,^£ 

Wei Jingsheng, dissident, the first imprisoned activist after the end of the Cultural Revolution. 


33 


continued on next page... 




























8.4 People 

Complete GFW Rulebook for Wikipedia 

...continued from previous page 

Term GFW Rule 

Pages Affected 

JClfe broad, standard 

3 Talk:3tife 


Wen Yan, a “post-80” activist, organized China Pan-Blue Union. 


Jc'jAiS broad, standard 

broad, standard 

Wu Hongda, dissident. 

1 

2 Talk:^3Aji 

broad, standard 

broad, standard 

Wuer Kaixi, student leader of the Tiananmen Protest in 1989. 

2 Talk 

1 

broad, standard 

2 Talk^H^ 


broad, standard 1 

Xin Haonian, historian, author of the banned book 4 1 S(Who is the New China). 


If rm broad, standard 

1 

Xiong Yan, student leader of the 1989 Tiananmen Protest. 


fr broad, standard 

2 Talk:#!^ 


broad, standard 2 Talk:^^^ 

Xu Jiatun, Party veteran, was CCP’s Party head in Hong Kong from 1983 to 1989, defected after the Tiananmen crackdown. 


broad, standard 

SkW-KK broad, standard 

Yan Jiaqi, prominent dissident scholar. 

2 Talking 

1 

fJjllPPJ broad, standard 

broad, standard 

Yang Jianli, dissident activist. 

1 

1 

Wl'ffl broad, standard 

w)M broad, standard 

Yang Lian, dissident poet and writer. 

2 Talk:®;® 

1 

broad, standard 
ttUK broad, standard 

Yao Yongzhan, student leader in student protest in 1989. 

0 

2 Talk:ttl® 

broad, standard 

4 Talkie'S User_talk:£*@J Wikipedia:*f$^c/ig^:^*^ 

Yu Tie, dissident writer. 


MX IftK broad, standard 

iS.fS.llK broad, standard 

Yuan Hongbing, dissident. 

2 Tsfk-MttlK 

1 

broad, standard 

3 Talk:iS^BJ UserJalk:®^ 


broad, standard 1 

Yuan Zhiming, dissident writer, co-authored 'MW)(River Elegy), converted to Christianity and became a popular pastor. 


Siti-rfSl broad, standard 

Sjtfnfn broad, standard 

Zhang Yihe. writer. 

2 Talk:?tefn 

1 

JW| H1®. broad, standard 

2 Talk:JW| HI® 


JIMS broad, standard 0 

Zhou Guocong, a student protester in Chengdu in 1989, who was beaten to death in police custody. 


Government Officials (Members of the 17th Politburo Standing Committee) 


broad, standard 

7 Talk:58ISi^ User:58f§l^ User_talk:S 8 f 8 i§ Cate- 

$81®?* broad, standard 

gory:$8§r|?lf Category:$8®WS?^ 

6 58fS»tSi® User:58iS»T^ User:58iS»StS^-®-^g 


S User:58ii@Ji?w8 User_talk:630iiH£58§ll$S 

Hu Jintao, President of China and General Secretary of CCP from 2002 to 2012. 


zh. wikipedia.org/zh-cn/Sterm 2 
Hu Haifeng, son of Hu Jintao, involved in the Nuctech-Namibia scandal. 
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zh.wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hans/$term 
zh. wikipedia.org/zh-hant/Sterm 
Hu Haiqing, daughter of Hu Jintao. 

1 

broad, standard 

HIPPII11 broad, standard 

3 Talk: User:^^PH 

1 


Wu Bangguo, famous for ” (Five Don’ts) in 2011. is not blocked. 


imfKS broad, standard 

® W TalkiiS^S Talk:}£St3Sf3®W ... 

broad, standard 

Ml® prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 
imlPw prefix.standard 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-sg/Sterm 

2 WMtMM 

2 iSHfi 

1 


zh. wikipedia. org/zh-hans/$term 

Wen Jiabao, Premier of China from 2002 to 2012. is short for Premier Wen; is “best actor Wen” 


“rlxinflj prefix, standard 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 
zh. wikipedia.org/zh-hant/Sterm 
Zhang Peili, wife of Wen Jiabao. 

1 

broad, standard 

Wen Yunsong, son of Wen Jiabao. 

1 

broad, standard 

1US5W broad, standard 

Jia Qinglin. 

3Talk:^lA# UserMK# 

1 

broad, standard 

broad, standard 

Li Changchun, Propaganda Chief from 2002 to 2012. 

2 Talk:^3£# 

1 

33 iST broad, standard 

4 Talk:3]3fiT User:3}^ 


SiS 1 ! 1 broad.standard 10 Talk:!/ Category: 

Xi Jinping, President of China and General Secretary of CCP from 2012. 


prefix, standard 

Peng Liyuan, singer, wife ofXi Jinping. 

1 

>\ Btf prefix, standard 

Xi Zhongxun, communist revolutionary, father ofXi Jinping. 

1 

33 BfiiP prefix, standard 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

1 
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Xi Mingze, daughter ofXi Jinping. 

’MWiWs zh.wikipedia.org/zh/$term 1 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

Qi Qiaoqiao, Xi Jinping’s sister, whose wealth was reported by Bloomberg in Jun 2012. 

zh. wikipedia.org/zh/$term 1 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

Deng Jiagui, Xi Jinping’s brother-in-law, whose wealth was reported by Bloomberg in Jun 2012. 


broad, standard 
^SJtL 33 broad, standard 

Li Keqiang, Premier of China from 2012. 


3 Talk:^]S33 Category:^]nC33 

5r#ti 


®H33 

eaa 

He Guoqiang. 

broad, standard 

broad, standard 

2 Talk:® 1133 

1 

if?kit 

broad, standard 

6jfzkIL(JgB) Jf7XlL(#§HlK'/&Ail) TalfcJfzkJi User:Jf 
Alt UserJalkMAIt 

Zhou Yongkang, China's Police Chief between 2007 and 2012. 


Government Officials (Members of the 17th Politburo) 



M3 

broad, standard 

ioML(i964^fcb£) M3-(A2?gM) MLoKinAt!) A 
HUSH) Talk: M3-(MM) 

Ml 

Wang Gang, this is a very 

broad, standard 

common Chinese name. 

2 AiLdMfflS*) IB3-(®^AtD 

Wang Lequan. 

broad, standard 

broad, standard 

2 Talk:MU: 

1 

MB 

MH 

Wang Zhaoguo. 

broad, standard 

broad, standard 

2 Talk:MB 

1 

Mill 

Wang Qishan. 

broad, standard 

2 Talk:3ERlll 

0E5 

Hui Liangyu. 

broad, standard 

2 Talk: HE A 

mm 

Liu Qi. 

broad, standard 

broad, standard 

2 Talk: MR 

1 

broad.standard 4 Talk:^ij‘Sill User:^>J‘Sill User_talk:^lj‘Sill 

10# Li] broad, standard 1 

Liu Yunshan, director of the Propaganda Department ofCCP Central Committee from 2007 to 2012, then promoted to the Standing Committee of 

the 18th Politburo. 

MMM 

Liu Yandong. 

broad, standard 

broad, standard 

1 

1 


broad, standard 

3 Talk:^®il8 UserJalk:^®?® 
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Li Yuanchao. 


jEE'ff broad, standard 

25 ?*#_(#¥) ffiff-CMiKfi) ffi 

Wang Yang. The word also means “ ocean ”, hence very common. 


ilj IJB broad, standard 

311 rail broad, standard 

Zhang Gaoli. 

2 Talk:^]®S 

1 

‘Hx'fUl'I broad, standard 

broad, standard 

Zhang Dejiang. 

2 Talk:?K1irI 

1 

broad, standard 
#IE# broad, standard 

Yu Zhengsheng. 

2 Talk: fjIE* 

1 

If- broad, standard 

Xu Caihou. 

2 Talkifi^jy 

broad, standard 

Guo Boxiong. 

2 Talk Mi&M 

#,®* broad, standard 

12 #J®**®[ #j?B**# #J®*3g Talk:#®* Talk:#® 
**®; Talk:#j®*^# Template:#J?R* Category:#®* ... 


?SS!?B2f5 broad.standard 4 User:S!?§^K User_talk: W- j?S 

Bo Xilai, ex-Party chief of Chongqing, sentenced to life in prison in Sep 2013 due to the Neil Heywood murder case and Wang Lijun Incident. 


Officials (Past High Ranking Officials) 


*** zh.wikipedia.org/zh-hans/$term 

10 7*h¥ *****^ >t|S 

iqs/hWft ... 

prefix, standard 

zh.wikipedia.org/zh-hans/$term 

1 


Deng Xiaoping. (Deng Bin) is an obscure alternate name of Deng Xiaoping. 


prefix, standard 
zh. wikipedia.org/zh-tw/Sterm 
Deng Zhuodi, grandson of Deng Xiaoping. 

1 

If! H broad, standard 

Hua Guofeng, passed away on Aug 20, 2008. 

1 

77 M wikipedia.org & Sterm 

74-EN:2 7j 3L(ffiiSt*.) 77S* 77S*:® 77JIK 

**S II77S Talk:77S 7?M(en) 77S*M(en) ... 


Wan Li. The word means ten thousand (Chinese) miles, appearing in many phrases, in particular, 77 (ten thousand miles of Great Wall). 


CLffK broad, standard 

9 TalUlffR User:E@S 

broad, standard 

File:£C#R*iS®l#^tllJ.jpg Category:?!#!^; ... 

6 


tLPPTemplate:UserJIP? 
Jiang Zemin, President of China and General Secretary of CCP from 1989 to 2002. 


prefix, standard 

Jiang Mianheng, son of Jiang Zemin. 

1 

yl trfiM zh. wikipedia. org/zh-cn/$term 

Jiang Miankang, son of Jiang Zemin. 

i 

*11 prefix, standard 

zh.wikipedia.org/zh/$term 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
*11 prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-sg/Sterm 

5*H_(MiK*) *l§T&«*if *Kffl *1 ISt 

4 *8i/\E 0 IE *I§M 
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zh.wikipedia.org/zh-hans/$term 
Li Peng, Premier of China from 1988 to 1998. 

prefix.standard 1 

2 j£frt}IJt prefix.standard 1 

Li Ruihuan. 




Liu Huaqing. 


zh.wikipedia.org/zh-tw/$term 1 

prefix, standard 1 

zh. wikipedia.org/zh-hk/Sterm 


prefix.standard 4 SfffcfiLC^C) eillXt I-(^) 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 

prefix.standard 2 Kll) 

zh. wikipedia.org/zh-cn/Sterm 


Officials (other) 

wmm 

Cao Jianming. 

zh. wikipedia.org/zh-cn/Sterm 

it mm 


prefix.standard 

zh. wikipedia.org/zh/$term 

zh. wikipedia.org/zh-cn/Sterm 

i 

Chen Shiju, secretary of Hu Jintao. 



prefix.standard 

i 

Chen Xitong, ex-Party Chief of Beijing, jailed for corruption. 


iSJgffl 

Chi Haotian. 

prefix.standard 

i 

im 

prefix.standard 

i 

He Ting, successor to Wang Lijun as Chongqing Police Chief. 



prefix.standard 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

i 


prefix.standard 

i 


zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 


Li Hongzhong, Party Chief of Hubei Province, who became famous for grabbing the recording pen of a female reporter in Mar 2010. 


tJBl 

f®;tl 

Zeng Qinghong. 


JtllSi 1 zh.wikipedia.org/zh-tw/$term 1 

Su Zhi, ex-Party Chief of Urumuqi, capital city of Xinjiang Uyghur Autonomous Region. He was removed from the post in September 2009. 

prefix, standard 1 

prefix, standard 1 

Ling Jihua, Hu Jintao’s secretary. His son died in a car crash on Mar 18, 2012, a major political scandal. 


broad.standard 2 Talk: 4^ 

Ling Gu, son of Ling Jihua, died in a car accident on Mar 18, 2012. 


prefix, standard 1 

Liu Hongwei. 


m 


broad, standard 


1 


Lou Qinjian, former Deputy Minister of Industry and Information Technology. Lou does not seem to have any scandal, his appearance on GFW’s 
rulebook may just mean he is personally closed to the GFW operation. 




prefix, standard 


1 
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IS ST prefix, standard 1 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hans/Sterm 


Ma Wen, a female Party official handled the Bo Xilai case. 


fSJtS zh.wikipedia.org/zh-tw/$term 

Qian Qichen, Foreign Minister from 1988 to 1998. 

gSS-(S R£) 


broad, standard 

2 Talk 


broad, standard 

1 

Song Renqiong. 




prefix.standard 1 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 

ISHUffi) prefix, standard 1 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
Tao Siju, Minister of Public Security from 1990 to 1998. 

IMM prefix, standard 1 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 

Tuo Zhen, Propaganda Chief of Guangdong Province, responsible for the Southern Weekly Incident in Jan 2013. 

L% prefix, standard 4 HiSAZptftfr 3; - Sill' ' Ixt If IS 

ISA?* 

Wang Lijun, ex-Police Chief of Chongqing, right-hand man of Bo Xilai. His defection to the US Consulate in Feb 2012 started the biggest 
political turmoil in recent years, which led to the downfall of Bo Xilai. 

‘jlCiZ.jii prefix, standard 1 

Zhang Lichang, infamous ex-Party Chief of Tianjin. 

jW|5S prefix, standard 1 

Zhou Qiang. 

Miscellaneous 

broad.standard 4 User_talk:fic;fiYf:£l' ; P Category 

Fang Zhouzi, an internet celebrity, famous for picking fights with other celebrities, usually in the name of fighting fraud. 

broad.standard 3 Talk:®^!^ User_talk:^5fl^^ 

Song Binbin, daughter of Song Renqiong. She is widely regarded as responsible for beating Bian Zliongyun to death in the Cultural Revolution. 

2YT Y zh. wikipedia.org/zh-tw/Sterm 1 zE'hY 

zh. wikipedia.org/zh-sg/Sterm 

Wang Xiaoya, a TV host, wife of Cao Jianming. Rumors say she was involved in the corruption case of Wang Yi. 

3EM: prefix.standard 1 

Surname Wang. This page is about the family name Wang. 

prefix.standard 1 

Yu Piping (Jennifer Yu), president of Rothschild’s Greater China Region, whose husband is rumored to be Jiang Zemin's adopted son. 

broad.standard 6 ‘j'KiEEi; 1 1 5 Talk:®ti_(ilt5t) UserJalk:® 

®ii broad.standard 7 31115-(MiS#) Talk:® 

Talk:®fi_(lSSlM) User Talk: 

Zhang Yu, likely this is due to the actress who publicized sexual video with movie directors in Nov 2006. 
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This section covers GFW rules regarding Hong Kong, Taiwan, Tibet and Uyghur. The numbers suggest China censors 
worry about Hong Kong much more than Taiwan. 

• Hong Kong, 52 rules total. 17 rules are for Hong Kong politicians in the pan-democratic camp, 21 rules are for 
protesting rallies in Hong Kong. 

• Taiwan, 15 rules total. 6 rules are for presidential candidates in the 2012 election, 4 are about presidential 
elections in Republic of China (a.k.a. Taiwan). 

• Tibet, 39 rules total. 5 rules target the English Wikipedia. 

• Uyghur, 22 rules total. 2 rules target the English Wikipedia. 

In addition, on GFW’s rulebook for HTTP response filtering (see Section 5 and Table 5.1), there are five terms 
targeting Wikipedia that are related to Tibet. The term (Dalai Lama) is one of them and it interrupts many 

sensitive and non-sensitive Wikipedia pages. 


Term 


GFW Rule 


Pages Affected 


Regional: Hong Kong 


broad, standard 2 

broad, standard 4 TalfcWffiSj! 

Szeto_Wah prefix.standard 1 Szeto_Wah 

Szeto Wall, (1931 - 2011/01/02), Hong Kong politician and activist. Iconic figure of Hong Kong democracy movement. Founder offiifkrz (the 
Hong Kong Alliance in Support of Patriotic Democratic Movements in China). 


mm 


broad, standard 
broad, standard 


1 

4 Talk:^Sig User:#User_talk:#?®H^tt 


Martin Lee Chu-Ming, prominent Hong Kong politician, regarded as "Father of Hong Kong Democracy” by the pan-democracy camp. 
broad.standard 2 Talk:^^A 

Lee Cheuk-Yan, Hong Kong politician and activist. Chairman of the Hong Kong Alliance in Support of Patriotic Democratic Movements in China. 

'fSJ'K't broad, standard 2 Talk: 'fnTfS't 

Albert Ho Chun-Yan, member of the Legislative Council, former chairman of the Democratic Party. 

broad, standard 1 

broad.standard 4 Talk:HI'S UsenfflljiSo : ! 

User_talk:fiij£ : ! 

Leung Kwok-Hung, Hong Kong activist, nickname “longhair”, member of the Legislative Council. 

^Ji@P broad, standard 1 

HO HOT broad, standard 2 Talk: SO HOT 

Emily Lau Wai-Hing, Hong Kong politician, member of the Legislative Council, current chairwoman of Democratic Party. 

broad, standard 1 'jKAA-Oi'H ARfD) 

broad.standard 3 ?fiAA_(SISjfi±) Talk:?SAA 

Cheung Man-Kwong, Hong Kong politician, former member of the Legislative Council, member of the pan-democracy camp. 


SOM 33 prefix, standard 

Lau Nai-Keung, pro-CCP Hong Kong politician. 


1 


prefix.standard 2 : 

Henry Fok Ying-Tung, (1923 - 2006/10/28), Hong Kong businessman, regarded by many as the most powerful Hongkonger in the politics of China. 

prefix.standard 2 

David Clive Wilson, Baron Wilson of Tilly orn, 27th Governor of Hong Kong (from 1987 to 1992). 


wmwi r 
#i®®# 

Hong Kong march. 


broad, standard 
broad, standard 


0 

1 Category:#?®®# 


-U —Wfa 


prefix.standard 
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A—■jE'fT prefix, standard 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hant/Sterm 
A—’AtSfr zh. wikipedia.org/zh-tw/Sterm 

zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-hant/Sterm 
W'M't, —zh. wikipedia.org/zh-hant/Sterm 
HriSA—'tSfJ zh. wikipedia.org/zh-tw/Sterm 

zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

1 

l A—AiSIf 

l #*8A-J9??t 

1 #«A-ISA 


?F?S?A—'AKA - prefix.standard 1 

various terms for the Hong Kong 1 July Marches, an annual protest rally with hundreds of thousands participants. 


2003^W'®'t^i® ; fT prefix, standard 

Hong Kong 1 July March in 2003. 

1 

A — AS broad, standard 

2 Talk:A—Ai tL 


A—Af8 broad.standard 2 Talk:A — SftfS 

July 1st effect, a term describing the effect of pro-democracy sentiment on voters voting for candidates in the pan-democracy camp. 


2007^1X5^ AAi=f $13 broad, standard 

2007AfilfAA ; £ : $:B ; broad, standard 

mmn 

2007 rally in protesting the veto of universal suffrage by China. 

1 

2 Talk:2007^,£lf AA^iA^illSA 1 

2007^1Slit r jll|i prefix, standard 

2007^§?§Esi 1 ll'il;|S prefix.standard 

1 

5 2007^ : i : ^[Ifit-iIi|ifSjlAA^ 2007^ : §itl£ti#iSfl& 
«§¥ 2007^#/SES#iiSIS* ... 

2007 district legislature election. 


2008^^1X20127X13 12; broad, standard 

ASMt 

2008^#lX2012tt ; f il broad, standard 

1 

2 Talk:2008 WtonftWHAiSff 


AlAr 

2008 rally for double universal suffrage in 2012. “double” refers to the elections for the chief executive and legislature. 


broad, standard 1 

2012 ^Hif3! broad.standard 3 Talk:2012^#^ii Category:iJ##?g2012^#WiIfi^ilSA 

double universal suffrage in 2012. “double” refers to the elections for the chief executive and legislature. 


RfflA tSWSs broad, standard 0 

KUSAffiPlifi broad.standard 2 Talk:PJKAlll^lll 

Civil Human Rights Front, a platform affiliating almost all the pan-democracy camps, organizer of the annual July 1st Marches. 


broad, standard 

PcAM-fHi®) broad, standard 

0 

5 Talfop^AStuClr/i;) Template: IX; A3I-(1lr?fi)/meta/color Tem¬ 
plate: Pc; AS_(§/S)/meta/shortname Category: PcAH-AIAfl) 

Democratic Party (Hong Kong). 


HAftS] broad, standard 

1 


ESSfrRj broad.standard 2 Category:HSfTJKjfiKil 

April Fifth Action, a socialist organization, but aligning more with the pan-democracy camp and opposing CCP regime. 


HfSIXA broad, standard 

23 W/iSAIS W/SKA/S @®KAlSli?AIA ^/SXA 
M Talk:HP®KAKAt$ii1l r Template:UserJ|r®i?;A3 ... 

Hong Kong democracy. 


HP®® AAA broad, standard 

HfUllSAAti broad, standard 

Hong Kong independentism. 

1 

1 

§ffi'SS;A'/JS broad, standard 

13 WM1 KAMAAX'JS Talfc^fggg; 

A W. Template:#®£XAMfi®lAI£tS#i!IIS ... 

continued on next page... 
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Term GFW Rule 


Pages Affected 


Hong Kong pan-democracy camp. 


zh.wikipedia & Sterm 


2 Talk:®gflf 


Hong Kong Alliance in Support of Patriotic Democratic Movements in China. 


Regional: Taiwan 


prefix.standard 3 

Ma Ying-Jeou, President of the Republic of China (a.k.a. Taiwan). Ma was the (winning) presidential candidate ofKuomintang in the 2012 election. 

prefix, standard 2^RiirtH 

Wu Den-Yih, Vice President of the Republic of China (a.k.a. Taiwan), running mate ofMa Ying-Jeou in the 2012 election. 

prefix.standard 2 

Tsai Ing-Wen, former chairwoman of the Democratic and Progressive Party (DPP). Tsai was the presidential candidate of DPP in the 2012 election. 


prefix, standard 1 

Su Chia-Chyuan, running mate of Tsai Ing-Wen in the 2012 election. 


prefix, standard 1 

James Soong Chu-Yu, founder and chairman of the People First Party (PFP). Soong was the presidential candidate ofPFP in the 2012 election. 


prefix, standard 1 

Lin Ruey-Shiung, running mate of Soong Chu-Yu in the 2012 election. 


broad, standard 0 

broad, standard 2 Talk: 1518? 

Cary S. Hung, Taiwan activist, with involvement in mainland China’s democracy movement. 


ass 


broad, standard 
broad, standard 


1 

7 


Talk: ^ 1 


Talk Mi 


Category: rftiP 1 


Mongolian and Tibetan Affairs Commission, a government branch of Republic of China (a.k.a. Taiwan). 


ismmm 

Taiwan election. 

prefix, standard 

2 

Taiwan president. 

prefix, standard 


+ prefix. standard 

president of Republic of China. 

is tpwRmmmm 

2012 ^ prefix.standard 

2012 presidential election in Republic of China (a.k.a. Taiwan). 


niiSiS HflflS prefix, standard 

World United Formosans for Independence. 

1 

Regional: Tibet 

Dalai _Lama 

£1M!R 

Dalai Lama. 

en.wikipedia.org/wiki/$term 

prefix, standard 

0-EN:ll Dalai_Lama(en) DalaLLamas(en) Dalai_Lama_V(en) 
Dalai_Lama_(song)(en) Dalai_Lama_Renaissance(en) ... 

Tenzin.Gyatso en.wikipedia.org/wiki/$term 

Religious name of the 14th Dalai Lama. 

0-EN:4 Tenzin_Gyatso(en) Tenzin_Gyatso_(Dalai_Lama)(en) 
Tenzin_Gyatso, _ 14th_Dalai_Lama(en) ... 

14th Dalai (Lama). 

broad, standard 

broad, standard 

6 SI+Effi£$S|l$l5S Talk:M+ 

4m+Etii:tilWPJS Sf+Effl 

SftWlfl-iflfJg Talk:®+Effijlffl#!l*S 

is®*® 

broad, standard 

broad, standard 

1 

2 TalkbtHMl 
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Dalai Group. 


broad, standard 

Phuntsog Nyidron, a Tibetan Buddhist nun, jailed from 1989 to 2004. 

i 




litzh.wikipedia.org/zh-cn/$term 1 

Dhondup Wangchen, producer of the documentary Jigdrel ('F'B^aR, Leaving Fear behind), in prison since Mar 2008. The co-producer A JUStif 
(Jigme Gyatso) is not on GFW’s rulebook. 


JE^&E-EE broad, standard 1 

Tsepon Wangchuk Deden Shakabpa, (1907 -1989/02/23), Tibetan historian. 


ffiUtlTl® broad, standard 

broad, standard 

Tibet issues. 

2 Talk:BH|nM 

2 ®^I1W»1*I 

Flag_of_Tibet en.wikipedia.org/wiki/Sterm 

It ill $ if 1 Stt broad, standard 

broad, standard 

Snow Mountain and Lion Flag, the flag of Tibet. 

0-EN:2 Flag_of_Tibet(en) Flag_of_tibet(en) 

1 

2 Talk: S ill if?® 

broad, standard 

M. Template:Country_data_H$ii/ftt®JlT ... 

Tibetan government in exile. 


broad, standard 

2 TalkiiiS^fi 


illM HIS broad, standard 1 

Dharms a la, a town in India, home to Tibetan refugees and the Tibetan government in exile. 


He® prefix, standard 

broad, standard 

1 

6 WtMMWi IS 8 A ill IIS Template:UserJstii® Tem- 
plate:User Category:Jx&lSISfi^lft3SA ... 

Tibetan independence. 


Tibetan _Independence_ en.wikipedia.org/wiki/Sterm 

Movement 

broad, standard 

Tibetan independence movement. 

0-EN:2 Tibetan_Independence_Movement(en) 

Tibetan Jndependence_movement(en) 

2 Talk:HH® sLjKij) 

Students_for_a_Free_Tibet en.wikipedia.org/wiki/Sterm 
§ S broad, standard 

§ SHillKSjllitt broad.standard 

Students for a Free Tibet Movement. 

O-EN: 1 Students_for_a_Free_Tibet(en) 

2 

i 

prefix, standard 
zh. wikipedia.org/zh-sg/Sterm 
International Campaign for Tibet. 

i 

ffijSyAlIf.HZii! 0 broad, standard 

USE AS/E A. 0 broad, standard 

1 

2 TalkffiHARiSA 0 


broad, standard 1 

Tibetan Uprising Day, observed on March 10 to commemorate the 1959 Tibetan uprising. 


broad.standard 2 


broad.standard 2 

Nangpa La Shooting Incident. On September 30, 2006, a group of Tibetan refugees were shot by Chinese border guards when passing Nangpa La. 


2008 ;{ FiSAijliX.S'f i t 1 broad, standard 

2008S3HAitLSVfr broad, standard 

Tibetan Protests in 2008. 

i 

i 

2012 ^HE» prefix, standard 

Riots in Tibetan Region in 2012. 

i 


MISSIS prefix, standard 1 

Serial Self-immolations in Tibetan Region, more than 120 cases from 2009 to 2013. 
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prefix.standard 1 

The Tibetan Book of Living and Dying, a book on Tibetan spirituality, not politically sensitive. 


W%n]SWt$iW!. broad, standard 0 

SlnSilfi'tlW; broad, standard 1 

Songs for Tibet - The Art Of Peace, an album by western musicians, released in 2008. 


Regional: Uyghur 


East_Turkestan_Indepen en.wikipedia.org/wiki/Sterm 

denceJVlovement 

0-EN:3 East_Turkestan_Independence_Movement(en) 
East_Turkestan Jndependence_movement(en). .. 

Rebiya_Kadeer en.wikipedia.org/wiki/Sterm 

prefix, standard 
ttH broad, standard 

M ttffi broad, standard 

Rebiya Kadeer, Uyghur activist. 

0-EN:l Rebiya_Kadeer(en) 

2 miE'-Ntft 

2*ttS-Aftft Talk:»fckS-i?!Sfr; 

2 S§tt@--N§If 

broad, standard 

2 Talk:fi#H£C 


HUH'/I broad, standard 0 

Hoseyinjan Jelil, East Turkestan independence activist, sentenced to life in prison in 2007. 


broad, standard 

broad, standard 

Mehmet Emin Hazret, East Turkestan independence activist. 

2Talk:XX»33C*i§ 

0 


SIuItE prefix, standard 1 

Uyghur, an alternative lerm for the current standard term ‘‘A n T ”, which is not on GFW's rulebook. 


broad, standard 

18 

MmmwmwwmM ... 

short name for East Turkestan. 


zK^KBt±! broad, standard 

im ... 

East Turkestan. 


broad.standard 

broad.standard 

East Turkestan Liberation Organization. 

i 

i 

broad.standard 

East Turkestan Information Center. 

2Talk:^±5^»T±mt't'A' 

broad.standard 

2 Talk:1S#tt^W^fi;*A# 


tlf broad.standard 1 

World Uyghur Youth Congress. It merged with a few other groups and formed [ TFr §§■ n /K ff: ffflrr, (World Uyghur Congress), which is not 
on GFW’s rulebook. 


Aigstl broad.standard 

iffilSiliSSS broad.standard 

Xin Jiang independence movement. 

2 Talk:Sfil®AisS] 

1 

.S/Izh.wikipedia.org/zh-hans/$term 
Urumuqi July 5th Incident, a riot in 2009. 

1 

prefix, standard 

1 


an incident on Sep 4, 2009 in which the Urumuqi armed police beat three Hong Kong reporters. 


zh. wikipedia.org/zh-hant/Sterm 1 

The 10 Conditions of Love, a documentary about Rebiya Kadeer. 


Table 8.5: GFW Rulebook: Regional: Hong Kong, Taiwan, Tibet, Uyghur 
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8.6 Falungong 

Falungong, a religious group persecuted in China, used to be the motivation and focus of GFW. There are 18 rules 
here, none seem to be new. There are many more Wikipedia articles related to Falungong but they are not on GFW’s 
rulebook for HTTP request filtering. The reason is that GFW has five Falungong terms in its rulebook for HTTP 
response filtering, i.e. “fig”, “falun”, “’Sf&Slj”, “'SffcA'S; & ^HR”, and n & These 

rules effectively block all Falungong-related Wikipedia articles, plus many more sensitive or non-sensitive Wikipedia 
pages not directly related to Falungong. 


Term 

GFW Rule 


broad, standard 

'ffifiti 

broad, standard 


Falun, dharma wheel, the first two characters of Falungong. 

broad, standard 

Li Hongzhi, founder of Falungong. 


Pages Affected 

M 'AtAftSAHWS1SM... 

44 fifraSj 'ffilm-fifjlIJi) 

msh ... 


3 Talk:#^ 


broad.standard 2 Talkiii^ili 

jUJHil! broad, standard 1 

wave of resignations from the Chinese Communist Party. 

broad, standard 2 Talk:^lif 

broad, standard 1 

Nine Commentaries on CCP, a series of articles published by Falungong organization. 

broad, standard 1 

broad.standard 2 Ta\k:WM^LV-f¥ 

Sujiatun incident, an alleged concentration camp in Sujiatun in which thousands of Falungong practitioners were allegedly tortured. 

AfflTUB'J'ffi broad.standard 3 Talki^CffijcB'fffi Talk:^CtS7tB'tffi/'i? : Sl 

A&HtcHtFUx broad, standard 1 

Epoch Times, a newspaper operated by the Falungong organization. 

SfJ*A%«£ broad.standard 2 Talk:§TJtA 

R/SAStln broad.standard 2 TalkiSrfSAlttJl n 

New Tang Dynasty Television, a TV station operated by the Falungong organization. 


AAffi-(AA) broad.standard 2 Talk:ARffi-(A X) 

AKSLfAA) broad, standard 0 

People's Newspaper, renminbao.com, a news site operated by the Falungong organization. 

broad.standard 2 Talk:^^fl 

broad, standard 0 

Tiananmen self immolation incident. A self immolation incident in the Tiananmen Square on Jan 23 2001. China authority claims the person is a 
Falungong practitioner, while the Falungong organization claims it was staged to defame Falungong. 

ftMAA broad.standard 2 Talk:#SA^HP^r#ft n 

Sound of Hope Radio Network, a radio station affiliated with Falungong. 


Table 8.6: GFW Rulebook: Falungong 
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8.7 Tiananmen 

Among all sensitive topics in China, the most sensitive is the Tiananmen Square Protest and Crackdown in 1989, also 
known as the June 4th Incident, June 4th Movement, the June 4th Massacre, etc. 

China authority has been very successful in removing the Tiananmen memory from Chinese people. There is nearly 
zero mentioning of this historical event inside China, including news, books, TV, and internet. As a result, most people 
in China do not know what happened in 1989, and the majority of the younger generation have not even heard of it. 
GFW has been instrumental in constructing this memory hole. 

This section alone contains 116 such rules. In addition to that, we will also see many obscure terms motivated 
by Tiananmen Incident in Section 8.9. However, one surprising finding is that there is no Tiananmen related rules 
in GFW’s HTTP response filtering (see Section 5), hence many Tiananmen related articles are actually accessible in 
China, e.g. http://en.wikipedia.org/wikiATankman. Also, terms blocked by prefix rules can be accessed by a non- 
blocked variant. For example, “64^ / ( z t : ” (June 4th Incident) is blocked by prefix rules only for the “/wiki/”, “/zh-cn/” 
and “/zh-hkA’ variants, the other variants are all accessible in China, e.g. http://zh.wikipcclia.org/zh/64 r |l- j'j ; . 

This is exactly the opposite to the Falungong case. For Falungong, there are only 18 rules in GFW’s HTTP request 
scan, but the five rules in GFW’s HTTP response scan effectively block all Falungong-related content. 

We roughly divide the terms into the following groups: 

• Those targeting the English Wikipedia. 5 total. 

• June 4th event. 

• Memorial events. 

• People’s Liberation Army units which participated in the June 4th crackdown. 

• Publications and songs. 

The grouping is not clearcut here. Common prefix is another consideration for grouping, so that terms with common 
prefix are together, even though they might have different meanings. Also note that one Wikipedia URL might offend 
multiple GFW rules, because China authority really had thrown many many rules into GFW. 


Term GFW Rule Pages Affected 

Terms Targeting English Wikipedia 

Tiananmen_Square_Pro en.wikipedia.org/wiki/$term 0-EN:3 Tiananmen_Square_Protests_of_1989(en) Tiananmen_Squ 

tests_of_1989 are_protests_of_1989(en) Tiananmen_square_protests_of_1989(en) 

prefix, standard 

Note that GFW has this term for both English and Chinese Wikipedias, even though there is no such page in the Chinese version. 


Tank_Man en.wikipedia.org/wiki/Sterm 0-EN:3 Tank_Man(en) Tank_man(en) 

Tank_Man_(Battle_Angel_Alita)(en) 


prefix, standard 

Note that GFW has this term for both English and Chinese Wikipedias, even though there is no such page in the Chinese version. However, the 
article en.wikipedia.org/wiki/Tanhnan is accessible. 


Tiananmen_Massacre en.wikipedia.org/wiki/Sterm 0-EN:2 Tiananmen_Massacre(en) Tiananmen_massacre(en) 


Tiananmen_Papers en.wikipedia.org/wiki/Sterm (LENT Tiananmen_Papers(en) 

A book published in 2001. The Chinese version is published under the name 4 1 (The Truth of June 4th in China) the same year. 


20th_anniversary_Tianan wiki/$term O-EN: 1 20th_anniversary_Tiananmen_square Jncident_march(en) 

men_squareJncident_march 

Note that the rule’s Wikipedia part is simply “wiki/”, this is due to the length of the rule because GFW rules are limited to 64 bytes in size. 


June 4th Event 

June_4th prefix.standard 0 

Note that this rule is for the Chinese Wikipedia. This article was deleted in Sep 2013, see the notice at June Nth. 

Liusishijian prefix.standard 1 Liusishijian 

zh. wikipedia.org/zh-hans/Sterm 
This is the piny in for A TI Tff (June 4th Incident). 


continued on next page... 
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1989^ broad, standard 

Year 1989. This rule affects 232 Wikipedia pages. 

232 1989®A$P!®# 1989®®® M_(1989®) §111989® 
Talk: 1989® Category: 1989® Category_talk:1989 ®SM ... 

1989A$n 

prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 

1 

1989A$H 

1989 Tiananmen. 

zh. wikipedia.org/zh-cn/Sterm 

1 1989AAP1 

6.4®# 

prefix, standard 
zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

1 

64®# 

June 4th Incident. 

prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 

zh. wikipedia.org/zh-hk/Sterm 

1 

8964 

prefix.standard 

1 

the year, month and date of the June 4th crackdown. 


89 

prefix.standard 

1 

89®fe 

89 student movement. 

prefix, standard 

1 

89isSj 

89 movement. 

prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 

1 


89R;iS prefix.standard 1 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hans/$term 


89 democracy movement. 


A A prefix, standard 

eight nine. 


14AASjSL AAA^HW# AA®®g;±i5# AA®is 
AAfAAAflRPfff AAA» AAKAilSl ... 


AlH prefix.standard 44 AHSSl AH®# AHAI^I AHAJSK AHf.B#fil 

AH#® AHiH® AHAAffr AHAME® ... 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

six font; the most common short name for the Tiananmen Incident. Note that many terms with this prefix are blocked by the broad rule, see below. 


AH®# 


June 4th Incident. 


broad, standard 

zh-yue.wikipedia.org/wiki/$term 


34AEWAt AH®#&£ AH®#i0±'/gStl Talk:A 
H®# Template:AH®# Category: AH®# ... 


AHIfcffi 

June 4th songs. 

broad, standard 

4Talk:AH@fcffi UserAjiJfi/AH®:® Category: A Hflfcffi 

Ah if® 

AHtf® 

June 4th Poetry Collection. 

broad, standard 

broad, standard 

2 Talk: A Hit® 

0 

AHS® 

June 4th badge. 

broad, standard 

2 Talk: A HIS# 

gE broad, standard 

June 4th Internal Diary, a book published in 2006 based on diaries 

1 

(Lu Chaoqi), then-acting chief editor of People Daily. 

AH18M® 

broad, standard 

i 


18tli anniversary of June 4th Incident. 
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Pages Affected 



P£S«0 

mmm * 

June 4th Incident. 

wikipedia.org & Sterm 
wikipedia.org & Sterm 

1 

1 User:|E;i§i!tfpi 


prefix, standard 1 

“political disturbance between spring and summer”, an euphemistic term used by China authority in the 1990’s to describe the June 4th Incident. 


prefix, standard 1 

zh.wikipedia.org/zh/$term 

zh. wikipedia.org/zh-tw/Sterm 


Beijing democracy movement, this article redirects to the June 4th Incident article. 

broad.standard 1 

broad.standard 0 

Beijing Students Autonomous Federation, which is a major organizer of the Tiananmen protest. 

zh.wikipedia.org/zh-tw/Sterm 1 A g 

Independent Federation of Chinese Students and Scholars of USA, an organization formed in 1989. 

broad, standard 0 

IPS® broad, standard 0 

Short name for the above organization. Wikipedia does not have articles with these two titles. 

broad.standard 3 KzfeAWft Talk:R±^C# 

Goddess of Democracy, a statue created in the Tiananmen Square in 1989. 

KiEZLlf B.W prefix.standard 1 

zh. wikipedia.org/zh-tw/Sterm 

a manifesto published during the establishment of the Goddess of Democracy. 

i^rSrn prefix.standard 19 AiSrH-CMlKA) Jt) 

^ ^nmm^ ... 

zh. wikipedia.org/zh-cn/Sterm 

ASAP zh.wikipedia.org/zh-tw/Sterm 11 XArPP ^iSrPP-f&EISA) XpSPWfr Tz'fz PPftKffi A:S:PP 

pappus p^pps# ^ppgft 

Tiananmen. Note that many terms with this prefix are blocked by the broad rule, see below. 

ASrPPfJlS prefix.standard 1 

zh. wikipedia.org/zh-cn/Sterm 

Tiananmen Mothers, an activist group of families of June 4th victims. 

broad, standard 1 

A^PPfJiSSSS broad, standard 0 

Tiananmen Mothers Movement. 

broad.standard 6 

Talk:^n*# 

ASrPP?-# broad.standard 16 1989 ; ¥^$PP^ : 'f4 : A 0 AS: PP^-ft 05 ASrPP^ftt 1 

User:55PP¥# User:8M«/\055PPW6AKS ! ... 

zh-yue.wikipedia.org/wiki/$term 

Tiananmen Incident. The traditional Chinese version is one of four terms that target the Yue (Cantonese) Wikipedia. 

ASrPPMS prefix, standard 1 

zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hk/Sterm 

Tiananmen massacre. 

broad, standard 2 Talk:?&n_(£EStf) 

Tiananmen (documentary), a PBS documentary on the Tiananmen Incident. The English title is "The Gate of Heavenly Peace”. 

5 50 50 broad, standard 1 

53fPP50 broad.standard 2 Talk:5j£cPP3t0 

Tiananmen Papers, a book published in 2001. Its Chinese title is tllAHXffilT/ic Truth of June 4th in China). 

A H1/N0 Jtffl prefix.standard 1 
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Term GFW Rule Pages Affected 

The Truth of June 4th in China, Chinese title of the book Tiananmen Papers. 

A Al'l prefix.standard 1 

Chengtianmen, the original name of Tiananmen, renamed to Tiananmen in 1651. An obscure term. 

SHI'! prefix.standard 14 ^IH ['1 _(^ IIBl'l-ClSiS:) HHnV# UHl'l# StH 

ntf nanffiji smn#Affi... 

Jianguomen, a gate and area in Beijing. It is blocked for two possible reasons, one is killings in this area in 1989, the other is a mass shooting 
incident on Oct 20. 1994. 

3E®# broad.standard 3 Talk:!®# UserJalk:!®# 

!!f£# broad, standard 1 

Wang Weilin, a widely circulated name of the Tank Man, whose identity and whereabouts remain a mystery. 

broad, standard 1 

MwffSb broad, standard 1 

Operation Siskin, or Operation Yellow Bird, an operation to rescue Chinese students and citizens who were wanted by the China authority after 
the June 4th crackdown. 


AAtfifr zh. wikipedia.org/zh-tw/Stenn 1 

zh. wikipedia.org/zh-hk/Sterm 

Worldwide Chinese Rallies, which happened overseas on May 28, 1989, to support the protests in China. 


Memorial Events 

mm prefix.standard 9 

h& mm&mm ... 

Victoria Park, a public park in Hong Kong. It is the venue of the annual vigil for the June 4th Incident. 

broad, standard 0 

broad.standard 3 Talk:liH7\EW7t0fe'^' 

Victoria Park June 4th Candlelight Vigil. 

zh. wikipedia.org/zh-cn/Sterm 1 

zh. wikipedia.org/zh-hant/Sterm 

Victoria Park Candlelight Vigil. 


prefix, standard 


Hong Kong June 4th Assembly. 


1 


prefix, standard 1 

zh. wikipedia.org/zh-tw/Steim 
zh. wikipedia.org/zh-hk/Steim 
Hong Kong June 4th Candlelight Vigil 


People’s Liberation Army (PLA) Units 

®HARf?fi$:®J£20J!i 

®¥ 

the 20th group army of PLA. 

prefix, standard 

1 


B¥ 

the 27th group army of PLA. 

prefix, standard 

zh.wikipedia.org/zh-tw/Sterm 

1 


38® prefix, standard 1 

Sfl38® prefix.standard 2 fit38®_( 0 ®Hi®) 

¥ HAf5:flljj[®|l38jfl prefix, standard 1 

S¥ 

zh. wikipedia.org/zh-cn/Sterm 

®BIAK;$Pf$£®fiS38^fl prefix, standard 1 

B® 

various terms for the 38th group army of PLA, which committed the most killings in the June 4th crackdown. 

39® 

®HARtfS£¥Jg39il 

a® 

prefix, standard 
prefix, standard 

1 

1 
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Pages Affected 


short name and full name of the 39th group army ofPLA. 

A H ASI¥S$i¥ifj54jfl zh. wikipedia.org/zh-cn/Sterm 

1 ahas«$¥: 

i?54 

the 54th group army ofPLA. 




65¥ prefix.standard 1 

zh. wikipedia.org/zh-hant/Sterm 

66455 nP PA prefix, standard 1 

F f 5 lllAH;S?fi^¥®65® zh. wikipedia.org/zh-cn/Sterm 1 . Ar- ^65AEt| A 

short name, internal code, and full name of the 65th group army ofPLA. 

Publications & Songs 

SiS 9K Eft prefix, standard 1 

zh. wikipedia.org/zh/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

SiSWlffi prefix.standard 0 

Songs of democracy movement. This page redirects to A01R® (June 4th songs.) 

SASfcSIfRASI prefix.standard 1 

zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

Concert for Democracy In China, a large concert on May 27, 1989 in Kong Kong to support the Tiananmen protest. 

fijA&'l'fijn zh.wikipedia.org/w & Sterm 2 Talk:JJi6tl'fSP 

Wound of History, song title to support the Tiananmen protest, a collaboration by many Hong Kong and Taiwan musicians. 

7 ft prefix, standard 1 

Blood-Stained Glory, a Chinese patriotic song, later used by Hong Kong people to commemorate the June 4th Incident. 

zh.wikipedia.org/zh-hans/$term 1 . 

The People Do Not Forget, a book by Hong Kong reporters who covered the Tiananmen protest in 1989. 

@[=1=1 prefix, standard 1 

Flower of Freedom, song title, inspired by China democracy movement. It became the theme song of Victoria Park June 4th Vigil. 


Table 8.7: GFW Rulebook: Tiananmen 
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8.8 Miscellaneous 

This section covers those terms not in any of the previous categories. Many terms here are surprising, amusing, puzzling 
and/or intriguing. 


Term GFW Rule Pages Affected 

Great_Wall_of_China en.wikipedia.org/wiki/$term 0-EN:8 Great_Wall_of_China(en) 

Great/WalLoLChinaJioax(en) Great_WalLof_China_(album)(en) 
Great_Wall_of_China_Marathon(en) ... 

This one is very odd. The page is entirely non-sensitive. Why did GFW have this rule?? 

38SI prefix.standard 3 nSfitA.^ SfcSSifcft 

zh. wikipedia.org/zh-cn/Sterm 

prefix.standard 5 ISSfRFC ISSt®* ISJtS'fT 1SSISH 

zh. wikipedia. org/zh-cn/$term 

The term literally means “evil fun”, similar to the notion “kuso” in Japanese. It usually involves parody. 

kuso prefix.standard 5 KUSO KUSO JOJSO JgIS# KUSO JOJSO JSBSiK Kuso 

Kuso_game 

Wikipedia’s description is “(a) term used in East Asia for the internet culture that generally includes all types of camp and parody". 


XW 

prefix, standard 

zh. wikipedia.org/zh-cn/Sterm 

6 AI5±S-(SJH) AIMS? AlSflfc AfiKiS 

•ft-M 

prefix.standard 

i 

mm 

prefix.standard 

zh. wikipedia.org/zh-cn/Sterm 

l 

mm 

prefix.standard 

l 

mmw 

prefix.standard 

l 


These five terms are gunpowder, explosives, petrol bomb/Molotov cocktail, incendiary bomb, napalm bomb, respectively. 


ffiiBfff broad.standard 688 1917 : ¥'ffiS®f¥B&' T Talk:®J8®rA 

Template:'ffiSSfMi Category:®BSff i rS : |Ml ... 

Russia, in traditional Chinese. This rule alone affects 688 Wikipedia pages. 

I'! broad, standard 1 

laSP1 broad.standard 3 1IW SPlClWifsiO TalfcfiW SP1 

Quan Yin Method, a religion derived from buddhism. 

Tokyo-Hot zh.wikipedia.org/zh-cn/$term 1 Tokyo-Hot 

a Japanese pornography brand. 

prefix, standard 1 

prefix, standard 1 

list of massacres. 

prefix, standard 1 

Independence March, the page is about the Turkish national anthem. 

2005^5^ prefix.standard 2 2005^5 j! iffitft 

May 2005. It is a mystery why this term is on GFW rulebook. The author welcomes input regarding this term. 

2012^AfB StS prefix, standard 1 

2012 human extinction theory. 

it zh. wikipedia.org/zh-hans/Sterm 1 + AWPfiAW] 

WKA 

Beijing Central People’s Trust Real Estate Development Corporation Ltd, a company owned by Xi Jinping’s sister Qi Qiaoqiao and Qi’s husband 
Deng Jiagui, both names are on GFW rulebook. 


Table 8.8: GFW Rulebook: Miscellaneous 
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8.9 Non-Articles 

This section covers those rules specifically targeting non-article pages. Numbers in parentheses are counts of rules 
targeting that namespace. 

• Namespace Project (7). These included debate page for page deletion, page protection, vandalism, and their 
archives. Apparently many subjects which caused these debates are “sensitive” ones. These debates can be 
fierce and shedding light not only on the subject itself, but also on who inclined to toe the Chinese Communist 
Party line among the participating editors. 

• Namespace User (25). These 25 rules affect 10 users. For 7 of these, only a subpath of the User page is blocked; 
for the other 3, the entire User page is blocked. 

• Namespace Category (2). 

• Namespace File (2). 

• Namespace Talk (23). 

Most of these pages are obscure. Their inclusion in GFW seems to imply that China censorship officials and GFW 
operators have more-than-superficial knowledge of Wikipedia structure and content, which is quite contradictory to 
what we have seen so far. My speculation is that certain Chinese Wikipedia editors have interactions with Chinese 
censors and they provide these obscure pages for blocking. If it is the case I do not blame them, I think they acted out 
of good intention, hoping in return the China authority can lift the blanket ban and just block these specific pages. 


Term GFW Rule Pages Affected 

Namespace Project 

Wikipedia:Mffif?Jl8f'fllf prefix.standard 2471 Wikipedia:H^ Wikipedia:!!JUn!" 

tt/tfWikipedia:Mffii?JSt^tm/ffi¥Ix®S ... 

This is the debate page for page deletion. This rule affects 2471 pages, most of which are archive pages. 


Wikipedia: nit S i® M zh. wikipedia.org/zh-cn/$term 

This is the archive for page protection requests. 


37 Wikipedia:!!Wikipedia:!!#®® 

... 


Wikipedia:^i§ M i\f ifc prefix.standard 9 Wikipedia:Al'i§Ki’^i'fe5R?l/wikipedia Wikipedia:7'J'i!]5i'j'ife 

3^I/±SI Wikipedia:77igMitife3?3l/5>l£ ... 

This is an auto-generated page which lists “hot” Talk pages. Talk pages for sensitive topics are often hot due to edit wars. 


Wikipedia: HL &]§ prefix.standard 

This is the page editors ask for help from each other. 


11 Wikipedia:5I!^ji/#®/#!§/2007#l7! Wikipedia: 5® 
»#BMf®/ 2006 # 127 ! ... 


Wikipedia:S gijffjlffifl'/ prefix.standard 1 Wikipedia:^ HU 0!ffiff/i¥ : f3/2OlO^7-12^ 

f¥H/2010#7- 


This is vandalism archive for the period Jul to Dec 2010. Looking at the page content, quite several parts seem “sensitive”, but nothing stands out. 

Wikipedia:®]15^ K S fP prefix.standard 30 Wikipedia:®) if ^</2007^6^ 4 0 Wikipedia:®) 

if3^/2007^6 SMfPif^/2007^6^ 1 0 ... 

Page deletion nomination and voting for Jun 2007. Like as discussed in Section 4.2, this rule is actually for “Wikipedia:W\ P4^M^Pilt4</2007^fi6 M 
4 0” (for Jun 4, 2007). It gets truncated due to the 64-byte limit. That particular page contains a long debate on whether “/\|Z37SjW|^” (18th 
anniversary of June 4th Incident) should have its own article. 


Wikipedia: ®P^/2005^ prefix, standard 1 

iof! 

This is the chat page for Oct 2005 and it is very worth reading. From the page content, we learned that Wikipedia was blocked in Oct 2005, and 
many Chinese Wikipedia editors were disappointed, some of whom blame that Chinese Wikipedia has “too much” political content. There are 
several long discussions on proposals to create a “dean” (self-censored) Chinese Wikipedia. 


Namespace User 

User:Inspector/HfU±’tR prefix, standard 


continued on next page... 
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...continued from previous page 

Term GFW Rule Pages Affected 


This page is about the list of topics forbidden by Baidu Tieba. 

User:Lxrl234/535 prefix.standard 1 

zh. wikipedia.org/zh-hant/Sterm 

This page contains the content of the June 4th Incident article. 

User:Mongol/arch2 prefix, standard 1 

This is a personal archive page, I do not know why it is on GFW rulebook. 

User:Mungs/Wj® prefix.standard 1 

zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hant/Sterm 
This page is about Hong Kong human rights. 

User:Qingmui prefix, standard 1 

zh. wikipedia.org/zh-hk/Sterm 

This user has the content of Charter 08 on his personal start page. 

UsenTIrmq/A FI S/gTemp prefix, standard 1 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

This page has information about memorial events for the June 4th Incident. 

UsenZhangjintao prefix.standard 54 UsenZhangjintao/MainPage User:Zhangjintao/Computers 

User:Zhangjintao/Favourites UsenZhangjintao/ttJSMBfc 01* ... 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-hans/Sterm 

This user is a heavy Wikipedia editor who must have done something that offended certain people! His personal start page is currently empty. 

User:!®!! prefix, standard 1 

This user’s personal start page contains information about the June 4th Incident. 

User:Philip/June_4 zh. wikipedia.org/zh-cn/Sterm 1 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
This page redirects to the June 4th Incident article. 

U ser:Liangent-bot/B ase zh-cn/$term 

64URL/5YWt5Zub5LqL5Lu2 

pedia.org/zh-hant/Sterm 

5YWt5Zub5LqL5Lu2 zh.wikipedia.org/zh-hk & Sterm 

This is a robot user; this page redirects to the June 4th Incident article 
than b4 bytes), so GFW operators had to do some manual word here. 


Namespace Category 


Category IS prefix, standard 

Category:political books. 

1 

Category: 4 1 ft zh. wikipedia.org/zh-tw/$term 

Category: Chinese books. 

1 Category: A 

Namespace File 

File: The _Gate _of _Hea prefix. standard 

venly .Peace 

This file is a picture of the Gate of Heavenly Peace, i.e. Tiananmen. 

1 File:The _Gate_ofJTeavenly_Peace.jpg 

EpochTimes.svg broad, standard 

1 File:EpochTimes.svg 

This file is the logo of Epoch Times, a Falungong newspaper and website. This rule does not have the “File: ” prefix, but it is apparent that the 
intention is to block this file. 

Namespace Talk 



Talk:2013^ CSAiiA)) prefix.standard 2Talk:2013^ (OSAMA)) 

continued on next page... 
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1 User:Liangent-bot/Base64URL/5YWt5Zub5LqL5Lu2 

These three rules all look very odd, because the target string is long (more 
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...continued from previous page 

Term GFW Rule Pages Affected 

ff¥Wwl 

Talk page for one variation of the Southern Weekly Incident. Like as discussed in Section 4.2, this rule is actually for "Talk:2013 /l \ : - 

Talk:2013^ (('Sfi/j jW|5|C)) zh.wikipedia.org/zh-hant/Sterm 2Talk:2013^ 

§?¥« Talk:2013¥ 

Talk page for one variation of the Southern Weekly Incident. Like as discussed in Section 4.2, this rule is actually for “Talk:2013 /l \ : - JH^R)) 

Sr^SfiS)”. 

Talk:2013^ prefix.standard 2Talk:2013¥ (OSi/j M5f0> 

Talk: 2013^ §f¥#?Mi!lgSC*# 

Talk page for one variation of the Southern Weekly Incident. Like as discussed in Section 4.2, this ride is actually for "Talk:20l 3 /[ f {(Si/jjHzk)) 

sftMSrpj tsawesc*#". 


Talk: Pit jtM prefix, standard 1 

Talk page for Chen Guangcheng, the legendary blind rights defending lawyer. 

Talk:Sj/k-H:ig| prefix, standard 1 

Talk page for the Great Firewall. The main page title “ffi is not on GFW rulebook. 


Talk:/icf5 prefix.standard 2 TalfcikfUI^TK 

Talk page for gunpowder. 

Talk:HPlt^® prefix, standard 1 Talk:lP|i^ll£T^H^0i) 

zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hant/Sterm 

Talk page for international schools. The main article is not on GFW rulebook. It is mysterious that both the talk page and the main page 
(including the ones for international schools in mainland China) do not seem sensitive at all. 


Talk:7)5|E^ prefix.standard 1 

zh. wikipedia.org/zh/$term 
zh. wikipedia.org/zh-cn/Sterm 
zh. wikipedia.org/zh-tw/Sterm 
zh. wikipedia.org/zh-hk/Sterm 
zh. wikipedia.org/zh-sg/Sterm 
zh. wikipedia.org/zh-hans/Sterm 

Talk page for Song Zuying, singer, a household name in China. There is wide spread rumor about her relationship with Jiang Zemin. The main 
article is not blocked, even though it does cover that rumor. My hypothesis is that when the censors were studying this page, that paragraph 
did not exist on the main article. Apparently there has been edit wars over whether this rumor should be included. This also suggests that the 
Chinese censors have not revisited the main article for long. 


Talk:ifc-p5£ prefix, standard 

Talk page for princelings. 


5 Talk:MI£/2005MI3 Talk:M^/2006MIS Talk:±|F 
MIS3 Talk:MM^IS4 


TaUuffczSrliSL prefix, standard 1 

Talk page for the Weng’an Riot, a mass protest in Weng’an, Guizhou, on Jun 28, 2008. Surprisingly the main article is not on GFW rulebook. 


TalfcH^MfS^l^y# prefix, standard 1 

Talk page for the serial self-immolations in Tibet. The main article title is not on GFW rulebook, but a related term lil is. 


Talk: — prefix, standard 1 

Talk page for “one China ”, which is the official principle in China’s dealing with Taiwan affairs. The main page is not on GFW rulebook, but the 
Talk page has a lot more negative content towards China authority. 


TalfcS’Bl^fiJdE^'HP' prefix, standard 1 

Talk: S 1 H IS IS SS prefix. standard 0 

Talk pages for two terms about China's Jasmine Revolution. Both main articles are on GFW rulebook (by prefix rule). 

Talk: £ W.ffi prefix, standard 1 

Talk page for “leftist and rightist”. The main article is not on GFW rulebook. 


Table 8.9: GFW Rulebook: Non-Articles 
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8.10 SELF 

GFW rulebook contains some terms that are not targeting any specific websites, I call this type “SELF”. Excluding 
URL terms, this set is quite small. Out of this small set, 18 terms affect Wikipedia, which are listed in this section. 

By intuition, people may think that these SELF terms must be super-sensitive, but in reality, some of them are not. 
I tend to believe that some of these terms are relics from old times, and some are due to lack of considerations. 


Term GFW Rule Pages Affected 

H 2 Talk:R 

Manifesto of Online Human Rights, a manifesto published by Chinese journalists, writers, scholars and lawyers on Oct 8, 2009. 

g£l®l'l%£ 2Talk:|fi»Ift£ 

Radio Free Asia. 


Mao_The_Unknown_Story Mao_The_Unknown_Story 2-EN:l File:Mao.The_Unknown_Story.jpg File:Mao_The_ 

Unknown.Story.(Paperback).jpg Mao_The_Unknown_Story(en) 

Book title. Its Chinese title in traditional Chinese is blocked by the standard prefix rule. 

AP AP 3 APtJC Talk:AP 

Sky burial, a traditional funeral ceremony in Tibet. It is also the title of a banned book on Tibetan affairs by Wang Lixiong. 

3 TalkiJcWiS-^* User_talk:SMlEr^M 

Mein Kampf Adolf Hitler’s autobiographical manifesto. I believe GFW operators used this term for testing in the very early stage of GFW. 

3$5r 0 iH MA 0 iE 2 Talk: 3$$ 0 iB 

Diary in Yan’an, a book by Soviet diplomat Peter Vladimirov, which covers the history of Chinese Communist Party from 1942 to 1945. 

Ultrasurf Ultrasurf 2-EN:4 File:UltraSurf.png Template:UserJUltrasurf Ultra- 

Surf(en) Ultrasurf(en) Talk:Ultrasurf(en) User:Ultrasurf2(en) 

A popular censorship-circumvention software. Its Chinese name is which is blocked by the standard prefix rule. 


Ultrareach Ultrareach 0-EN:l Ultrareach(en) 

Ultrareach is the name of the company that develops Ultrasurf. 

AJfPlM 2Talk:^p«|^ 

The full Chinese name of Ultrasurf. The short name is blocked by the standard prefix rule. 


Name of the company that develops the popular circumvention software Freegate ( j=f Eh I ~l). Its literal meaning is “dynamic web’’ 


6 Ililf-ClSiljH) Ta\k:$&W User:Xiaogang_AU/:§ 

UserJalk:^® User_talk:WfE®S' 


Sheng Xue, dissident activist. 


&i*rm i 

Pangu band, an avant-garde punk band. The traditional version of a related term ISAIAH is also blocked, but not by itself. 

i 

Wang Binyu, a migrant worker who committed homicide and was executed on Oct 19, 2005. Not many people know about or remember his case. 


2 TalkAKEA 

Deng Zhenglai, a scholar passed away on Jan 24, 2013. This rule has been around for long time though. 

%'WK SAAW 2Talk:®/J'« 

Peng Xiaofeng, a high-ranking PLA general. The term’s sensitivity is a mystery. 


mbtX. 2 Talk:¥?M3 

Zhang Qinsheng, a high-ranking PLA general. The term’s sensitivity is a mystery. 


64 6 4 0-EN:l 6 4(en) 

This is for the June 4th Incident. The character between 6 and 4 is a middle dot, a.k.a. interpunct or interpoint (Unicode U+00B7). 

^PfxAES TIx/nH 5 Talk^PjxAHfl^ Template:UserAPfX/\|Z3 

Category^^TSAES^iSA 

redress June 4th. 


Table 8.10: GFW Rulebook: SELF 
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8.11 URLs 

GFW rulebook contains a large number of URLs, in the order of 10 3 to 10 4 , and many of them affect Wikipedia, 
especially the English Wikipedia’s non-article namespaces like Users. There are too many to include in this report, so 

I only report two types: 1. those affect pages in the Chinese Wikipedia, or 2. those affect articles (i.e. Namespace = 

0) in the English Wikipedia. For the latter type, the third column of the table “Pages Affected” only considers articles, 
unlike the previous sections. 

Note that in GFW rulebook, many URL terms require the dot before the website name, e.g. the leading dot in 
“.businessweek.com”. Many Wikipedia articles do not have that dot in its page title, thus avoid offending GFW, (hence 
not included in the table below). 

The table include several IP addresses, and some domains which are abandoned. 

Term GFW Rule Pages Affected 

URLs Affecting Chinese Wikipedia 


67.15.34.251 

67.15.34.251 

1 User:67.15.34.251 

123.204.163.6 

123.204.163.6 

1 User_talk:123.204.163.60/i?@ 

204.12.226.163 

204.12.226.163 

1 User_talk:204.12.226.163/i?@ 

204.74.211.115 

204.74.211.115 

1 User_talk:204.74.21L115/i?l§ 

.clrc.cc 

.clrc.cc 

1 User_talk:Www.clrc.cc 

.isiwa.cn 

.isiwa.cn 

2 UsenWww.isiwa.cn User_talk:Www.isiwa.cn 

qian.li 

qian.li 

1 User_talk:Qian.lian 

URLs Affecting English Wikipedia Articles 


64tianwang.com 

64tianwang.com 

0-EN:l 64tianwang.com(en) 

adultfriendfinder.com 

adultfriendfinder.com 

0-EN:l Adultfriendfinder.com(en) 

asianews.it 

asianews.it 

0-EN:l Asianews.it(en) 

blogspot.com 

blogspot.com 

0-EN:3 .blogspot.com(en) Blogspot.com(en) 
Cakewrecks.blogspot.com(en) 

.businessweek.com 

.businessweek.com 

0-EN:l Investing.businessweek.com(en) 

crackle.com 

crackle.com 

0-EN:l Crackle.com(en) 

cultdeadcow.com 

cultdeadcow.com 

0-EN:l Cultdeadcow.com(en) 

epochtimes.com 

epochtimes.com 

0-EN:l Epochtimes.com(en) 

.facebook.com 

.facebook.com 

0-EN:l Www.facebook.com(en) 

favstar.fm 

favstar.fm 

0-EN:l Favstar.fm(en) 

fleshbot.com 

fleshbot.com 

0-EN:l Fleshbot.com(en) 

.fulltiltpoker.com 

.fulltiltpoker.com 

(LENT Www.fulltiltpoker.com(en) 

ishr.org 

ishr.org 

O-ENT Ishr.org(en) 

ladbrokes.com 

ladbrokes.com 

O-EN: 1 Ladbrokes. com_Championship(en) 

megavideo.com 

megavideo.com 

0-EN:l Megavideo.com(en) 

njuice.com 

njuice.com 

0-EN:l ToonJuice.com(en) 

.nrk.no 

.nrk.no 

0-EN:l Www.nrk.no(en) 

.nytimes.com 

.nytimes.com 

0-EN:l Www.nytimes.com(en) 

pastebin.com 

pastebin.com 

0-EN:l Pastebin.com(en) 

phayul.com 

phayul.com 

0-EN:l Phayul.com(en) 

radiobeta.com 

radiobeta.com 

0-EN:l Radiobeta.com(en) 

rebuildhk.com 

rebuildhk.com 

0-EN:l Rebuildhk.com(en) 

tinychat.com 

tinychat.com 

0-EN:l Tinychat.com(en) 

twitoaster.com 

twitoaster.com 

0-EN:l Twitoaster.com(en) 

twitpic.com 

twitpic.com 

0-EN:l Twitpic.com(en) 

twitterfall.com 

twitterfall.com 

0-EN:l Twitterfall.com(en) 

veoh.com 

veoh.com 

0-EN:l Veoh.com(en) 

.youpom.com 

.youporn.com 

0-EN:l Www.youporn.com(en) 

.youtube.com 

.youtube.com 

0-EN:l Www.youtube.com(en) 

youtu.be 

youtu.be 

0-EN:l Youtu.be(en) 


Table 8.11: GFW Rulebook: URLs 
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9 Conclusion Remarks 

In this study, we examined the entire Wikipedia corpus (Chinese version and English version) and revealed the complete 
and exact GFW rulebook for Wikipedia. In addition, we examined GFW’s HTTP response filtering scheme thoroughly 
and surfaced a small but comprehensive list. A sample of notable findings are: 

• There are 78 terms for which GFW blocks a non-standard variant but not the canonical path. These are cases the 
censors intend to block but the block does not really happen, suggesting the censors have poor understanding of 
Wikipedia’s content and serving system. 

• Many obscure non-article pages are blocked, which raises suspicion that these pages were provided to the 
censorship bureaucrats by Wikipedia editors who are very familiar with the content (e.g. those who participated 
in the edit wars and/or discussions regarding self-censorship proposals). 

• GFW string matching rules have a 64-byte hard limit of size. 

• GFW’s HTTP request filtering and HTTP response filtering are two separate systems. The latter has a lot more 
heterogeneity. 

The biggest learning out of this study, in my opinion, is that GFW operation is haphazard and ill-maintained. Also, 
there are many indications that the GFW operators are somewhat disconnected from the censorship bureaucrats. 

We hope the revealing can be of interest to internet censorship watchers, Wikipedia researchers, China observers, 
and ordinary Chinese citizens. 

9.1 Future Work 

• The methodology in this study can be applied to more than just Wikipedia. 

• Naturally, it is nice to have a monitoring system to track changes of GFW’s filtering rules. Using the method¬ 
ology in this report, the monitoring can be much more efficient than checking individual Wikipedia URFs like 
Greatfire.org’s current approach. 

• Given our research result on GFW’s HTTP response filtering, we learn that the response filtering happens in a 
more distributed fashion and different ISPs may have deployed different filtering rules. By examining which 
rules are effective in which regions, we will be able to learn more on GFW topology. 

• The other type of work is from the social and political perspective. Because the list reported in this document 
is complete , we know not only about what topics are censored, but also what topics are not censored. This 
powerful comparison will give us deep insights about the China censors, e.g., what is their priority. 

9.2 Acknowledgements 

This research is a solo project over a short period of time, so this is a short list. The author thanks Jedidiah Crandall et 
al. for their excellent ConceptDoppler paper [5], and the operators of the Greatfire.org site for diligently tracking GFW 
and China censorship. Furthermore, I do like to mention Dr Xu Zhiyong (English Wikipedia, Chinese Wikipedia), the 
iconic figure of China’s New Citizen Movement. His passion, courage and coolness are inspiration to the author, and 
in particular, the fact that his Wikipedia entry is currently not blocked motivated me to conduct a thorough study of 
GFW’s blacklist for Wikipedia. Indeed, China censors do intend to block the Xu Zhiyong article, but it only blocks a 
non-standard variant. 

References 

[1] Greatfire.org. https://www.greatfire.org (since 2011). 

[2] Citizen Lab. https://china-chats.net (since Jul 2013). 

[3] Crandall, J. et al. (2013). Chat program censorship and surveillance in China: Tracking TOM-Skype and Sina UC, 

http://firstmonday.org/ojs/index.php/fm/article/view/4628/3727. 

[4] China Digital Times. http://chinadigitaltimes.net/chinese/category/RtWSIJ!/®i!f<wlJ¥/ (since Apr 2011). 

[5] Crandall, J. et al. (2007). ConceptDoppler: A Weather Tracker for Internet Censorship http://www.cs.unm.edu/ 

crandall/concepLdoppler _ccs07.pdf 

[6] Zittrain, J. and Edelman, B. (2002). Empirical Analysis of Internet Filtering in China. 

[7] Global Internet Freedom Consortium (2002). The Great Firewall Revealed, 

http://www.intemetfreedom.org/files/WhitePaper/ChinaGreatFirewallRevealed.pdf. 


57 




Complete GFW Rulebook for Wikipedia 


Appendices 

A Diagnosis of Greatfire.org’s Wikipedia List 

Greatfire.org keeps track of status of ~ 700 Wikipedia pages, out of which 393 pages are claimed to be 100% blocked or partially 
blocked. 1 examined these pages thoroughly. The following is the result: 

• 240 pages offend GFW’s rulebook of HTTP request scan, so they are not accessibe from China. A user trying to load these 
pages will immediately get an error page in the browser. In a probing session, we will get an instant connection failure. 

• 14 articles are accessible from China. They were tested from more than 20 different proxy IPs on 2013/10/13 and 
2013/10/17. For these pages, Greatfire’s “blockage percentage” ranges from 10% to 56%. This percentage is averaged over 
multiple tests. There can be a few explanations, for example, it could be that there were offensive terms in the page content 
in the past, but is no longer there now, or there is a certain level of noise in Greatfire.org’s testing. 

• 139 articles are interrupted by GFW. A user can generally load part of the page, but it would hang and/or get reset after a 
while. We examined thoroughly these pages by probing, and they provide a very valuable source for our study of GFW's 
HTTP response filtering (Section 5). 

I put these results on a Google spreadsheet at http://goo.gl/jUJpHb. This spreadsheet contains all entries that are reported as 
blocked or partially blocked by Greatfire.org. They are grouped into two sheets. The first contains those do not offend GFW’s 
HTTP request scan and the second contains those do. The order on each sheet is Greatfire.org’s default order, except for the 14 
accessible articles, which are put at the bottom of the first sheet. For those unaccessible pages due to GFW’s HTTP response 
filtering, we list the offending string as well. In cases where the page contains multiple offending strings, we list the first 
occurrence. 


B Self Censorship Efforts on Chinese Wikipedia 

During this study, we encountered many records which indicate self censorship efforts on Chinese Wikipedia. An excellent feature 
of Wikipedia is that all edits are recorded and most can be retrieved, so Wikipedia become an extremely valuable resource to study 
the cultural phenomenon of self censorship in China. 

According to the Washington Post news article "Reference Tool On Web Finds Fans, Censors” (Feb 20, 2006), “Wikipedia 
received positive coverage in China’s state press in early 2004, but it was blocked on 3 June 2004... Proposals to practice 
self-censorship in a bid to restore the site were rejected by the Chinese Wikipedia community .” In our study, we found records of 
such proposals, e.g. Wikipedia:®P^/2005^ (chat archive of 2005), also another relevant page: Wikipedia:® 

3jt/2007^6fl 4 0 (page delete proposal archive on June 4th, 2007). The debates were lengthy and heated. Both pages are 
intentionally blocked by GFW. 

We see proposals like: 

• Maybe we can make Chinese Wikipedia an “apolitical” encyclopedia, e.g., empty all sensitive articles and lock them up. 

• Wikipedia should obey local laws. We should monitor and modify content which break laws, thus avoid being blocked. 

• Stick to Wikipedia spirit, cooperate with the authority, (sounds a bit funny, no?) 

From the discussion records, we can see that this is not a minority sentiment. There can be three types of users who demand 
self censorship: 

• A: Those who believe in Wikipedia spirit, but feel a compromise may work better for China in the long run. 

• B: Those who believe in the Party and voluntarily toe the Party line, due to their education/China’s propaganda. 

• C: Those who are affiliated or sponsored by the China authority. I can not find evidence for this type from cursory browsing 
though. 

After the lengthy blockage and the emergence of Baidu Baike (Baidu’s copycat of Wikipedia), many Type B editors migrate 
to Baidu Baike. But after the lift of the ban, some Type B editors returned. Anecdotes show that there are more heated edit wars for 
sensitive articles not blocked by GFW, than those blocked ones. As a result, these unblocked articles become more favorable to the 
China authority than otherwise. 

The final note is that my gut feeling is certain Chinese Wikipedia editors have helped China censors by providing list of 
sensitive pages for blocking. The basis of this speculation is the existence of those obscure Wikipedia non-article pages on GFW 
rulebook (Section 8.9). I am an adamant opponent to GFW and internet censorship, but I feel a Wikipedia with most of its content 
accessible to Chinese users and only a small number of pages blocked, could still be net positive for China. 
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